/jp/ - Otaku Culture

Anonymous Fri Sep 15 13:49:22 2017 No.17636709 [View]
File: 57 KB, 800x1144, file.png [View same] [iqdb] [saucenao] [google]

>>17636662
>It would be a gargantuan undertaking to mine however many thousand example sentences are probably in that book though so if it's pointless then it would amount to a massive waste of time.

If you're using modern OCR the accuracy will be good enough that you just have to check it for mistakes on each sentence. If you're doing this one "card" at a time, it won't be too much work on a day to day basis, but it'll still take a while since there's so many example sentences in HJGP.

1) Download Tesseract 4 and extract it somewhere under your user directory. (your documents, desktop, etc)

https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM

"zip file with cppan generated .dll and .exe files"

2) Install Imagemagick. Use the installer.

http://www.imagemagick.org/script/download.php

ImageMagick-7.0.7-2-Q16-x64-dll.exe

3) Install ShareX.

4) What we're going to do is add extra hotkeys to ShareX that run a program that invokes imagemagick on a screenshot, sends it to tesseract, and copies tesseract's output to the clipboard. ShareX can't run shell scripts, so you'll have to use a separate program to invoke those programs. Here's mine:

https://a.safe.moe/JY7Ew.zip

My hotkey in ShareX is configured something like picture related. I have separate hotkeys for vertical and horizontal text. If you're OCRing HJGP in particular you're going to want two horizontal hotkeys, one that prioritizes English only and one that prioritizes Japanese over English, by giving ocr.exe different scripts.

Advanced search
Text to find
Subject [?]Search by post subject. Leave empty for any.
Username [?]Search for user name. Leave empty for any user name.
Tripcode [?]Search for tripcode. Leave empty for any.
Email [?]Search by email. Leave empty for any.
Filename [?]Search by image filename. Leave empty for any.
From Date [?]Enter what date to start searching from. Format is YYYY-MM-DD
To Date [?]Enter what date to start searching until. Format is YYYY-MM-DD
Image hash
Search in	All Posts OPs Only
Deleted posts	Show all posts Show only deleted posts Only show non-deleted posts
Internal posts	Show all posts Show only internal posts Show only archived posts
Order	New posts first Old posts first
Capcode	All Posts Only by Users Only by Mods Only by Admins Only by Developers
Results	Posts Threads
Action	[ Simple ]

Navigation
View posts	[+24]	[+48]	[+96]

/jp/ - Otaku Culture

Search: