[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/jp/ - Otaku Culture

Search:


View post   

>> No.17636709 [View]
File: 57 KB, 800x1144, file.png [View same] [iqdb] [saucenao] [google]
17636709

>>17636662
>It would be a gargantuan undertaking to mine however many thousand example sentences are probably in that book though so if it's pointless then it would amount to a massive waste of time.

If you're using modern OCR the accuracy will be good enough that you just have to check it for mistakes on each sentence. If you're doing this one "card" at a time, it won't be too much work on a day to day basis, but it'll still take a while since there's so many example sentences in HJGP.

1) Download Tesseract 4 and extract it somewhere under your user directory. (your documents, desktop, etc)

https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM

"zip file with cppan generated .dll and .exe files"

2) Install Imagemagick. Use the installer.

http://www.imagemagick.org/script/download.php

ImageMagick-7.0.7-2-Q16-x64-dll.exe

3) Install ShareX.

4) What we're going to do is add extra hotkeys to ShareX that run a program that invokes imagemagick on a screenshot, sends it to tesseract, and copies tesseract's output to the clipboard. ShareX can't run shell scripts, so you'll have to use a separate program to invoke those programs. Here's mine:

https://a.safe.moe/JY7Ew.zip

My hotkey in ShareX is configured something like picture related. I have separate hotkeys for vertical and horizontal text. If you're OCRing HJGP in particular you're going to want two horizontal hotkeys, one that prioritizes English only and one that prioritizes Japanese over English, by giving ocr.exe different scripts.

Navigation
View posts[+24][+48][+96]