[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/lit/ - Literature


View post   

File: 453 KB, 210x210, 1337551231437.gif [View same] [iqdb] [saucenao] [google]
2977911 No.2977911 [Reply] [Original]

if every rejected novels ever submitted to publishing houses would get scanned and be available online, what sort of text mining method could be used to optimize the chances of finding decent ones?

>> No.2977935

Done by logging key words. You scan for repetition of the words 'wizard', 'vampire', 'magic', 'school', 'homework', and that should weed out a lot of crap. Then get more obscure. reoccurring words like 'vegan', 'homoeopathy', 'greenpeace' and anything else you don't want.

You then need to have an algorithm that counts the number of times the word 'she' is used in relation to page number, to get rid of any female 3rd person protag. (you can scan the author names to get rid of female writers).

By now you will have narrowed down significantly.

>> No.2977943

>>2977911
You don't. You pick one out randomly, out of nothing but pure caprice, and let that one arbitrary moment decide the value of the thing you choose to pick, since you had already really decided beforehand what you would pick in the act-of-deciding-what-to-pick, making it your own.

>> No.2977964

i bet we could make ensembles of subjects that were "hot" at a given time, the kind of subjects talked about by prize winning authors... and maybe something could be done with this by looking at other authors who used the same subject but might have been snubbed because of their geographic location.

--
also there could be some basic filtering regarding the overall structure of the novel; promising novels could be selected if they got a certain symmetry and "rhythm" in the length of it's chapters that is often found in "good" novels.

anybody know anything about narratology ? i forgot the little i knew about this but i think some of it could be applied

>> No.2978067

I bet it would have to be less words and more grammar structure. Highlighting comma usage, subject verb agreement, preposition overuse, etc.

High values in this things get thrown out and you can whittle it down to "good" works from there.

>> No.2978086

maybe there are great novels sleeping on shelves that only a couple of people have read... text mining and open source culture could be used to do them justice.

cool subject , i think. this guy illustrate some ways that it can be used

http://tedunderwood.wordpress.com/2011/08/15/how-to-make-text-mining-serve-literary-history-and-not-
the-other-way-around/

>> No.2978133

>>2977935
>scanning for vampires and werewolves
This would be a good crude method. Except without the 'removing females' part, that's silly.

>> No.2978146

some tags could be attributed to the novels like subject but also the year it was written, what part of the world, what was the socio-economic context, what relevant historical event happened during that time: this way we could possibly put some lights on unknown novels that were truly avant-gardists, that dared to address certain realities when everyone else was silent, that in hindsight were actually represent the zeitgeist much better than other novels that were celebrated at the time

>> No.2978163

>>2978146
Problem is you'd have to be able to do that automatically.

>> No.2978192

One could hope they'd tag the manuscripts correctly.

>paranormal romance