[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math

View post   

File: 34 KB, 640x480, 1280616295281.jpg [View same] [iqdb] [saucenao] [google]
1547915 No.1547915 [Reply] [Original]

what is the algorithm they use to get all those evocatives captchas?

just a random bunch:
oracling mutton, Otto Mattice, eluding motiles, millions century, Roth elated, security radiators, goal coherence, Quakers Cor., their lollygags, them scialoia, scrounger lid, lavishest traced

i wonder - if they just had a body of fancy writing it would still get the most common words and not the pretty ones. so can you think of the algorithm/system we're dealing with here, scientists?

>> No.1547926

also, what are the first few captchas you get
mine are
trapeze (2004a)
reason pease
phar rotary
Carnarvon desert
rallies strongly

>> No.1547927

They're random words from books that are being scanned. By typing the captcha you're helping transcribe books!

>> No.1547936

I just got these:
ing stoppable
mav watts
I'm burros

>> No.1547944

there's a video on youtube where the guy talks about it, go look

>> No.1547957

that does make sense. the scanning soft should be able to recognize the common words more easily, anon

but if they were unscannable whatfor are they distorted?

said, pizzeria
Shallek classes
Harms ungulate
eurosciences Brno
Hammarskjöld equator

and where do they get such good books? it's more than sufficient to name a band name. It's fancy.

>> No.1547964

well, give me a link or a keyword
i'll search

from halos
g(f)modules, arcane
Herzig impugned
Source: palpating
Flaw coalition

>> No.1547982

This, basically.
They have lots of computers scanning and digitalizing books, and more than one piece of software trying to recognize the words. When the different softwares disagree on what a word is, it becomes a captcha. When enough humans agree on what a word is, it becomes accepted.
One of the words is known, this is the control word (the one that's actually checked to see if you're human), the other is the one the computers couldn't get.
It tells all about it on the recaptcha site.

>> No.1547995

One of the words is from a book that the OCR failed to decode. After that, it is further distorted to avoid a better decoding algorithm detecting it.
The other word is a random known word.

When 3 people write the same thing for the unknown word, the software sends that word to google and it goes to a book.

From here we can reach the following conclusions:
1. Google is making of a lot of /b/tards its personal word-crunching army.

2. You only have to correctly write the known word.

3. If 3 /b/tards write nigger for the unknown word, it will end up in some random book.

>advocated op.