[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/vt/ - Virtual Youtubers


View post   

File: 97 KB, 850x604, __tokino_sora_and_a_chan_hololive_drawn_by_funi_mu9__sample-b26f4a25e663693b9f19520ca730e94c.jpg [View same] [iqdb] [saucenao] [google]
45061611 No.45061611 [Reply] [Original]

Sup schizos and shitpos

Just wanted to put something I've been trying of late out there, I've been using OpenAIs Whisper sequence-to-sequence neural net model (specifically the large variant) to try and generate AI subs for Hololive vods. As an example of a result I got today, here is are some subs for Subarus latest VOD (the Resi Remake playthrough) that I generated over the course of about 15 minutes.

It does get stuck in places (this one shows the same line for three minutes at the start until Subaru starts talking, for instance) and this being an AI that doesn't understand context and Japanese being a language comprised of 70% guesswork stuff can sound jank so I wouldn't advise using these without at least rudimentary Japanese knowledge, but it can still be an excellent tool to make JP vods more watchable for those who aren't fluent in nihongonese.

You can find the SRT file I generated here:
https://files.catbox.moe/0f8abp.srt

And if you don't want to download the VOD and apply it manually you could use this extension or something like it to apply the subs directly to the youtube VOD:

https://chrome.google.com/webstore/detail/subtitles-for-youtube/oanhbddbfkjaphdibnebkklpplclomal?hl=en

>> No.45061723

Okay turns out the subs are mistimed because I'm a retard and used the wrong timestamp, lemme just redo this real quick

>> No.45061790

In the meantime here's a proper example using part 1 of Fubukis dead space playthrough:

https://files.catbox.moe/5b6tzz.srt

>> No.45062198

>>45061790
Be aware it takes a little bit to kick in, it starts subtitling at about 30 seconds in.

>> No.45062271

>>45061611
Thanks for your hard work Anon!

>> No.45062308

Couldn't you at least link the streams you're talking about

>> No.45062424

>>45062308
Sure thing anon, here's the Fubuki stream you can use the non borked subs with:

https://www.youtube.com/watch?v=pAT_AZCNTck&

>> No.45063874

>>45062424
Seems to work decently well, at least for well-formed sentences that she reads off the game and short simple reactions that she has
But I'm more interested in seeing how it reacts to difficult content

>> No.45064047

>>45061611
Here's a fixed version of the subs:
https://files.catbox.moe/plicl4.srt

For this VOD:
https://www.youtube.com/watch?v=fp6l92a0oXQ

>> No.45064093

>>45063874
Any suggestions anon? I've gotta go sort out some paperwork so I can throw my workstation at em in the meantime and post the results when I get back

>> No.45065726

>>45064093
Something really difficult would probably be this Koyo and Noel bathing ASMR
https://www.youtube.com/watch?v=pAT_AZCNTck
>multiple speakers
>constant background noise
>difficult to hear voices
>innuendos
Or one of those big collabs with many people talking over each other, I don't realistically expect it to get much at all out of those
https://www.youtube.com/watch?v=xfon1W9BCVs
Or a holo who mumbles a lot, Aqua maybe?

>> No.45065944

>>45065726
I'll throw it at KoyoNoel but I can tell you right now it would implode with one of the big chaotic group collabs, might feed it through anyway as a test though

>> No.45066018

>>45065944
As for Aqua I don't think it'll struggle too much, but when stuff is mumbled to a degree where you have to infer what word its meant to be it'll probably become much less accurate (though it does have a limited ability to do guesswork)

>> No.45067536

>>45065726
Yeah that bathing ASMR is too brutal for it with all the noise and the low speech volume, while it did better than I expected its not even worth posting the results, same thing would happen with one of the chaotic large collabs I'd wager

>> No.45068881

can i get subs for the recent minecraft vod from pegora? thanka

>> No.45068944

doing god's work, anon
bless

>> No.45069300

>>45068881
Sure thing anon, give us about half an hour and I'll throw em up

>> No.45069368

>>45061611
impressive stuff, anon
with the gpt-4 stuff from today, it seems like a universal translator is possible within the year.

>> No.45069414

>>45061611
/jp/ on suicide watch. Imagine wasting 10 years of your life learning a dead language when machines are close to translating it.

>> No.45070204

>>45068944
Stream downloaded - network has been kinda slow so it took a bit more time than I anticipated. Throwing the AI at it now.

>> No.45070248

>>45069414
i know this is difficult to understand for monolinguals, but learning a language does not mean you translate everything back to your native language
translation is a different skill and will never give the same result

>> No.45070343

>>45067536
I figured, thanks for trying though

>> No.45070472
File: 181 KB, 400x400, 1623017241427.png [View same] [iqdb] [saucenao] [google]
45070472

>>45070248

>> No.45070557

>>45070472
One day AI will surpass all human ability, and we'll learn to appreciate imperfection instead of always scrutinising art by its technical quality.

>> No.45071612

>>45061611
>>45069300
source code/how can I build this myself? thanks

>> No.45071701

>>45071612
https://github.com/openai/whisper

Instructions for installation can be found here - be aware if you don't have a tensor core equipped GPU its gonna be slooow

>> No.45071764

>>45071612
As far as arguments are concerned I use --language Japanese --model large --device cuda --task translate --output_format srt

>> No.45071823

>>45068881
DONE!

AI Subs:
https://files.catbox.moe/wro0im.srt

VOD:
https://www.youtube.com/watch?v=xq6lCaie4Ag

>> No.45071884

japanese class was really that hard?lol

>> No.45071960

>>45071884
If you're not using every resource the tech you worked for provides you're cucking yourself anon

>> No.45072017

>>45071764
yeah I found it, thanks. is --task translate better than piping to another software though? skimming the paper now

>> No.45072146

>>45072017
Whisper performed better than the SOTA in zero shot translation tests, and in my experience it beats out DeepL by a fair bit in terms of making subtitles legible.

It does make a fair few mistakes but if you know even basic Japanese vocab the mistakes it makes are a lot easier to ignore than the nonsense deepL outputs half the time when fed with a transcript.

>> No.45072238

>>45072146
their paper claims Maestro is better (for the given datasets ofc) for X -> EN on the largest models (that you're using). it appears closed source though or at least I can't find it

>> No.45072287

>>45072238
Yeah sadly that model aint open source so can't be built and readily used by your regular joe schmoe like Whisper can

>> No.45072433
File: 976 KB, 1920x1080, image-min.png [View same] [iqdb] [saucenao] [google]
45072433

>>45072287
The fact it can translate what Pekora says at 4:36 this accurately is pretty insane considering she's putting on a chuuba voice and speaking crazy fast

>> No.45072463

>>45072287
now I wonder if you could make this live. pipe chunks of audio + rollback window (for improving accuracy). thoughts?

>> No.45072596

>>45072463
You definitely could but you'd need bloody fast hardware to do it, I have an RTX 4080 (got it discounted) and even that probably wouldn't be able to keep up with a live feed.

>> No.45072707

>>45072596
Actually, turns out someone's already had a crack at it:
https://github.com/fortypercnt/stream-translator

>> No.45072718

>>45072596
15 minutes for a couple hour stream is pretty good though, is the issue non-amortized starting cost?

>> No.45072791

>>45072707
I'll take a look, thanks

>> No.45072922

>>45072718
I stand corrected, considering the existence of the repo I posted above and the faster fork of whisper you can definitely do this live if you have high end hardware. Exciting stuff!

>> No.45073002

>>45064093
Houshou Marine zatsudan. She speaks really fast.

>> No.45073100

>>45073002
The final boss of nihongonese, shoot us a zatsudan VOD of your choice and I'll feed it through anon

>> No.45073381

>>45072707
There's this as well actually
https://github.com/Awexander/audioWhisper
God I love FOSS shit

>> No.45073457

>>45072922
faster fork? you'd have to modify their thing tbough then right

>> No.45073477

>>45073381
The demo video used in the repo is even a Subaru clip, guess I've been beaten to the punch!

>> No.45073581

>>45073457
https://github.com/guillaumekln/faster-whisper#installation

Apparently its up to 4x faster without losing any accuracy

>> No.45073849

>>45073581
oh I read the doc, the live translator compatible. I'll try it out and see if my build can take it, I have worse specs than you

>> No.45074357

Considering the existence of that repo that outputs live TL to the command line, all we need is to pipe that into an overlay and if it works well we have live subs

>> No.45076505

bumping good thread

>> No.45076653
File: 40 KB, 220x220, 1678700478695693.gif [View same] [iqdb] [saucenao] [google]
45076653

>>45071884
>paying to learn a language

>> No.45076800

>>45064093
some say polka is really difficult to understand please try generating for some of her talking streams like
https://www.youtube.com/live/L1ldmdqxV8A?feature=share

>> No.45077507

>>45076800
Sure thing, I'll give it a shot

>> No.45078720

Sorry, this thread is too smart for me and I don't understand anything that is going on. Have you guys figured out a way to translate live streams for dumb fucks like me? or do I have to wait longer for that to happen?

>> No.45078870

>>45078720
Theoretically?
https://github.com/fortypercnt/stream-translator
https://github.com/Awexander/audioWhisper
I'm trying to install the first project above on minimal requirements right now. If it works out reasonably might make a rentry for literal retards similar to what they have over in the AI threads.

>> No.45079050

>>45078720
VODS yes, live streams - almost

>> No.45079240

>>45078870
>>45079050
That's amazing. Looking forward to those guides!

>> No.45079461

I love you autists so much

>> No.45079583

>>45076800
Done! Output below:
https://files.catbox.moe/c2u91o.srt

>> No.45079631

>>45079583
Seems it had trouble with Polka because it got stuck for like the first half hour

>> No.45080813

Forgot to mention, to use it with your GPUs CUDA cores you need to do the following:

pip3 uninstall torch
pip cache purge
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117

>> No.45080854

>>45079583
thank you checking it out rn

>> No.45081086

>>45080854
I should warn you, polka seemed to be beyond its capabilities

>> No.45081629

>>45073100
I don't have one on hand, sorry. She's done loads though, so you could probably pick any.

Also, Luna and Miko because people think they're hard to understand, and Korone because apparently she speaks in a different accent

>> No.45081913

OP here, signing off for the night as its getting incredibly late - if someone gets the live translation repo working be sure to report your findings here, I'm excited to see where we can take this!

>> No.45081984

>>45081913
Gets it working while I'm gone, that is - I'll install it myself come morning

>> No.45082009

>>45080813
I redid torch and the dependency installation from scratch (for stream-translator), and also made sure the CUDA version matched the torch version downloaded, but torch.cuda.is_available() continues to return False.
Thoughts? I'm continuing without it for now.

>> No.45082820

>>45082009
What GPU are you running?

>> No.45082917

>>45082820
RTX 2070

>> No.45083031

>>45082917
Hm, that is odd. Are you running the install in a virtual environment or your regular windows environment?

>> No.45083129

>>45083031
virtual environment, idk if that affects anything

>> No.45083367

>>45083129
Right, which Pytorch version did you install after purging the one packaged with whisper?

>> No.45083461

>>45083367
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

>> No.45083640

>>45083461
Are you certain you installed that package within your virtual environment and not just in Windows?

>> No.45083874

>>45083640
I think torch exists locally, I'll try a clean build I guess. Should CUDA be installed on Windows or locally somehow? I just downloaded an installer and used it.
also as an update I can sort of get it to work with the small model. Lots (but not maxed) of CPU usage and it misses a lot of sentences/produces some nonsense, but a couple things are in common with what's actually said

>> No.45084243

>>45083874
I wouldn't worry about your CUDA toolkit install as long as you installed it through the official NVIDIA installer

>> No.45084529

>>45083874
As far as torch is concerned if you're working in that virtual environment without using the regular openai whisper build just let requirements.txt take care of things for you and don't worry about uninstalling torch, failing that run through the purge process from the post above but install the cu117 build of torch.

>> No.45084677

Worst case scenario just draft some 5head autists from /g/

>> No.45084696

>>45084529
hmm torch.__version__ gives a cpu version. I'll try further. Feel free to go to sleep if it's late for you

>> No.45085616

>>45084696
Yeah I think I'll do that if that's okay, its too late for me to think clearly enough to troubleshoot over text

>> No.45085722

>>45085616
Looks like the cu113 install might be incompatible/phased out, I can get torch '1.13.1+cu116' in.
Thanks for the help, I'll record additional observations as I go.

>> No.45086540

>>45085722
torch.cuda.is_available() reports True now

>> No.45087581

>>45079050
what's the retard-proof way of translating vods? (generating english subs)

>> No.45088115

>>45086540
Doesn't use my GPU still for some reason. Medium model appears to run reasonably competently. Large model leads to
>torch.cuda.OutOfMemoryError
Despite apparently enough memory existing. Trying to fix that.
>>45087581
Build is here
https://github.com/openai/whisper
You'll need a reasonable GPU or it will be slow using CPU.

>> No.45090063

>>45088115
My GPU says python is using it and this disappears upon closing the program. So, perhaps the model is not using as many GPU resources as I expected (it uses a bit but seemingly more for the graphics of the cmd interface itself).

>> No.45090882

>>45090063
So apparently:
large model needs ~10GB of VRAM. RTX 2070 only has 8GB
https://github.com/openai/whisper/discussions/895
parameters for different sizes:
https://github.com/openai/whisper#available-models-and-languages
Going to try to set up faster-whisper and see if that helps

>> No.45091963

>>45090882
faster-whisper can handle the large model in <8GB VRAM. Seems to use less resources overall. Either way though it doesn't appear to be using a lot of my GPU on defaults, going to fiddle with parameters now.

>> No.45092412

Imagine when live TL is possible how hard /jp/cels will seethe now that they cannot gatekeep lel

>> No.45093033

>>45092412
>Imagine
I'm literally running it on my computer right now, anon.

>> No.45093176

>>45091963
Looks like it can randomly have anywhere from a couple second latency to almost half a minute. Plus the memory loops that were mentioned above. This is with
>--use_faster_whisper --model large --language en --interval 2 --history_buffer_size 20
on an English stream so I can easily validate it.

>> No.45093989

>>45093033
Can you hook it onto a livestream and not just feed it vods?

>> No.45094063

>>45093989
Yes, that is what I was referring to. I'm still doing some tests but my notes about it are above. Currently doing some checks with VOD translation.

>> No.45095593

>>45093176
update: trying to get cuDNN

>> No.45097660
File: 282 KB, 1200x857, 1675486859884534.jpg [View same] [iqdb] [saucenao] [google]
45097660

>> No.45097799

>>45095593
not sure if it did anything. bumping while I create a makeshift instructions file for retards

>> No.45097822

holy shit, had no idea things were this far along. gonna clone this repo and give it a shot with the chinchilla stream tonight + 4090, will report results

>> No.45100033

>>45097799
tempbump

>> No.45100666

>>45061611
>>45062271
>>45071612
>>45092412
>>45093989
Rentry for stream-translator >>45072707 live TL
I am stupid and bad at software so I probably wrote it in a stupid way
rentry co live-tl

>> No.45101001

>>45097822
>>45100666
thanks anon, that's a nice write up.
naturally I pick the stream to try this where the streamer sets up translation on her side lmao. should have known better than to try this on latest versions with Windows native. faster_whipser does not like and nvidia says its unsupported. will have to try again later with either downgrades as in the guide, WSL, or Linux native.

>> No.45101069

>>45101001
You can compare the output to her translations. What are your specs btw, and what is the command line command you're trying to run?

>> No.45101801

This sounds super neat. Always wanted to watch the JP girls but I'm already trying to balance learning two other languages

>> No.45102023

>>45101069
yeah I did without faster_whisper... it was pretty usable most of the time, def enough to explore using it more. anywhere from 5-60 seconds behind, seemed to get stuck occasionally. her autotranslate clearly isn't super good either, even with my shitty japaneses hers was pretty off often so tough to compare.
just running the default options atm, no tweaks and no faster_whisper, so like this:
> python translator.py https://www.youtube.com/watch?v=IjWCTun0K1M
on i9-13900k 64gb ddr5 6000mhz rtx4090
not convinced gpu acceleration was working right even on the default model, cpu was getting pretty hot. pulled in the cuda deps with winget so I got latest versions which nvidia claims doesn't support gpu acceleration on windows anymore, so it probably screwed up someplace. been using this box mostly for gaming and not really a great windows dev, so I'll give it another shot tomorrow in either WSL or finally get around to throwing some linux on the other nvme

>> No.45102196

>>45102023
rtx4090 should be way more than enough. check the bottom of the rentry for a couple updates on how to check if your gpu is being utilized
also use --model large since you can support it, it should be noticeably better.

>> No.45102231

>>45102023
also I checked the stream and she is playing an english dub. try adding "--language ja" as well, might help since it doesn't have to decide what/how to translate as much?

>> No.45102340

>>45102231
>>45102196
yup gpu was off, for some reason pip brought in torch+cpu. hang on, I'll monkey with pip and see if I can get the gpu verison for the default model tonight at least.

>> No.45102406

>>45102340
try
pip uninstall torch
pip install torch torchvision torchaudio --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cu116
(inside your venv). Replace 116 by the version of your CUDA (11.6 or 11.7).
The default, and what is contained in requirements.txt for stream-translate, is 113, which did not work for me and caused this issue.

>> No.45102512

>>45102406
yeah pretty much, but I'll need 121 I think since winget grabbed the latest.
> nvcc --version
> nvcc: NVIDIA (R) Cuda compiler driver
> Copyright (c) 2005-2023 NVIDIA Corporation
> Built on Wed_Feb__8_05:53:42_Coordinated_Universal_Time_2023
> Cuda compilation tools, release 12.1, V12.1.66
> Build cuda_12.1.r12.1/compiler.32415258_0

>> No.45102574

>>45102512
be aware if you do that you'll have to build pytorch from binaries for the time being
https://discuss.pytorch.org/t/install-pytorch-with-cuda-12-1/174294
You can also have multiple CUDAs downloaded with no issue (and I think the latest version doesn't override the older ones but not sure since I uninstalled 12.1)

>> No.45102850

>>45102574
ugh and that means getting msvc and all that jazz set up on this install. Ok path of least resistance: I'll try the downgrade real quick. anything wrong with 11.8? Or should I go 11.7?

>> No.45102887

>>45102850
I think 118 should work, I'd be interested to know if it does easily. 117 and 116 definitely work based on me and the other anon.

>> No.45103697

>>45102887
ok downgraded to 11.8, readded cuDNN and readded torch to the venv as described. gpu is working properly now.
gave it another try as below (I already converted the models earlier):
> python translator.py https://www.youtube.com/watch?v=IjWCTun0K1M --use_faster_whisper --language ja --model large
and holy shit it's completely live. I had to refresh my stream window because the translations were ahead of what was being said. system is much happier, no noticable heat increases or system lag. super impressive.
thanks for the help anon, nice writeup. let me know if there's anything else you wanna test

>> No.45103899

>>45103697
you're welcome!
I'll add 118 to the rentry. for now nothing. I think the other guy is better at this stuff anyway. I really should sleep in time for work, will maybe try fast-whisper VOD translation tomorrow
Also I'll read any post on /vt/ containing the string/word "rentrylivetl" in the body if anyone notices any errors/updates in the writeup

>> No.45104223

>>45103899
Yeah me too, already gonna be light on sleep. I'll try to keep on eye on these threads and that repo. it's working pretty fantastic as is but would be nice if it exited a bit cleaner when the steam ends, had to spam control-c to get my console back. I'll try to take a shot at fixing it, PR if I get something working.
thanks again!

>> No.45104375

>>45104223
it takes a while to fully shut down after ctrl+C is registered, not sure fully why but probably de-loading the model. if it fails to load the model but stays in limbo it exits instantly, and smaller models seem faster

>> No.45104565

I've always wanted to see anki decks for specific JP vtubers' most common spoken words. Hopefully we're one step closer to that.

>> No.45104622
File: 681 KB, 1300x1190, 1639030462201.jpg [View same] [iqdb] [saucenao] [google]
45104622

you could sell this to Cover and become multimillionaire

>> No.45105368

>>45104375
you'd def know better than me, I'm a complete hack compared to the all star who got this working.
for what it's worth though on my system it seems to hang indefinitely until I spam interrupts, then I get an exception from the ffmpeg read on 131. makes me think maybe it's waiting on the thread for some reason... maybe tweaking the shutdown could make it faster? I could be totally off though and it's already great so don't waste a second on it. I'll poke around and if I find any way to improve it (unlikely) I'll post / PR it.
again great work on all this anon, it's truly fantastic. star trek level stuff

>> No.45106859

Is there a way to use Stream Translator for Twitter space? Streamlink doesn't seem to have a plugin for it.

>> No.45107639

>>45106859
You could try using audiowhisper to translate your desktop audio instead of pulling from a stream source

>> No.45107680
File: 147 KB, 300x300, 1677678276373506.png [View same] [iqdb] [saucenao] [google]
45107680

>>45104622
And then lose it all getting sued by OpenAI because its a non commercial licence

>> No.45107918

>>45107639
audiowhisper doesn't work with faster-whisper right? Default whisper doesn't run on my 3070 TI for some reason.

>> No.45108422

>>45107918
Default whisper ought to work provided you don't use the large model, which needs 10+ gigs of VRAM.

>> No.45110156
File: 53 KB, 739x1024, 1671145514209769m.jpg [View same] [iqdb] [saucenao] [google]
45110156

Imagine the amount of time some of these /jp/ gigasperg fags spent learning Japanese for the sole purpose of watching internet anime girls and we are now able to have the same experience for zero effort. Skynetfc wins again lol lmao get dunked on meatbags.

>> No.45112803

>>45110156
learning japanese is fun

>> No.45113544

Testing the live translator now, RTX 4080 FP16 - wish me luck

>> No.45113779

>>45113544
Few seconds behind live, seems slightly less accurate than the stock whisper implementation with VODs but that's to be expected really

>> No.45113927

>>45113779
Interesting experiment but for the time being I think I'd recommend just using vods over the live translator

>> No.45114190

Trying Live TL as well, RTX 3080. Speed of tl is within 6 seconds. Accuracy is very spotty.

>> No.45114348

>>45110156
>same experience
lol
lmao

>> No.45114570

>>45114190
The interval flag for calls to the language model is set to 5 seconds by default so that'll be why its behind by that much, course dropping it could adversely affect the already subpar accuracy of this jerry rigged live implementation

>> No.45115593

>>45114348
Even if it's not there perfectly now it will be within 6 months or less, keep coping unless you want to tell us about you are a master at Japanese dialects and nuanced jokes?

>> No.45116329

>>45114570
Nahone
>>45115593
It's currently struggling with very basic Japanese. Jokes, names, abbreviations, slurred speech, and sometimes even loan words it either doesn't even attempts to translate or turns it into complete gibberish. Shit it even throws in random Cyrillic symbols for some fucking reason. None the less I'm impressed with it, even in it's current state. Also learn English you fucking faggot

>> No.45116349

>>45065726
https://www.youtube.com/watch?v=qh9bG3SnGrA
Try KanaMari's latest one, they use a lot of slang and talk fast, so that'd be a good test

>> No.45116377

>>45073100
try this anon >>45116349

>> No.45116702
File: 16 KB, 891x184, image_2023-03-15_222241187.png [View same] [iqdb] [saucenao] [google]
45116702

>>45061611
I have been playing around with Whisper too and I found this github where it uses Whisper to translate audio from livestreams in real time. I can't get it to work on my end, perhaps you should try it.

https://github.com/fortypercnt/stream-translator

>> No.45116844

>>45116702
anon...
>>45072707
>>45073381
>>45073581

>> No.45116847

>>45116702
Mentioned further up the thread anon, we got it to work but its not all there yet unless we discover some tweaks that can refine its accuracy

>> No.45116997

If even Google can't get good machine translation, I don't think some random anon with free software will be able to crack it.

>> No.45117021

>>45116844
>>45116847
Oops sorry my bad

>> No.45117057

>>45072707
Wish it could do live subtitling while playing back the video on VLC (since this just uses streamlinks and whisper together).

>> No.45117105

>>45116997
We got "good machine translation", it obviously doesn't match human translation but it mogs Google.

>> No.45117151

>>45116997
Google is jobbing hard anon they're losing the AI war to Microsoft.

>> No.45117162

>>45117057
For that you want AudioWhisper, which uses stereo mix to translate desktop audio. If you're running video playback on VLC though you're better off just using stock whisper to generate an SRT file for the video.

>> No.45117219

>>45117151
And its worth noting that this model is developed by Microsoft's AI partner

>> No.45117474

>>45116997
Whisper is from OpenAI which is now owned by Microsoft

>> No.45118261

Back for a bit before work.
>>45117219
>>45117474
microsoft is just a shareholder, not controlling, I think.
>>45104565
You don't have to translate, you can ask it to transcribe instead (default for whisper on VODs, option on stream-translator). Then filtering this could give a list of kanji easily, and maybe a list of words using an appropriate grammar regex? Might look into it later, any particular people you can to try it on?

>> No.45118336

>>45118261
Correct, hencewhy I said partner and not subsidiary

>> No.45118347

>Machine translating vods
Because otakmori just wasn't bad enough already

>> No.45118451

>>45118347
If Otakmori released whole ass vods they wouldn't be so bad

>> No.45118514

>>45118451
This is also better than Otakmori half the time

>> No.45118606
File: 189 KB, 1158x1637, borger.jpg [View same] [iqdb] [saucenao] [google]
45118606

>>45118451
Yes they would be so bad

>> No.45118687

>>45118347
it was already possible to translate clips from your native language using machines with DeepL, but you had to transcribe it accurately. definitely seen that before too.
Anyway, I would only use this for personal use. Technically you can get sued for monetizing the output of this

>> No.45119419

>>45118606
Okay yeah fair enough

>> No.45119460

>>45118687
Yeah, OP here - really wouldn't use this for non private content/SRT releases like what I did as its obvious that you're using an AI and OpenAI could crack down on you hard

>> No.45119643

Their paper claims 36.2 BLEU on high. Supposedly 30-40 is "understandable to good". However, it feels more in the 10-30 range depending.

>> No.45119739

>>45119643
Depends on the solution used, I find that downloading vods audio streams at the highest quality possible then running it through the large model gives pretty good results - janky sure, but if you have basic knowledge of Japanese already its basically just there to fill in gaps in your knowledge and you can catch the mistakes it makes easily

>> No.45120168

>>45119739
So you download the whole video, not just best-audio?
Also, what params are you using/what specs?

>> No.45120260

>>45119643
also BLEU is a pretty anal corpus dependent metric which explains a few things

>> No.45120476

>>45120168
Sorry that comment was scuffed as fuck, what I meant to say was that I download the whole video then strip out the video and leave the audio using ffmpeg to save on space, then run it through stock whisper with the following parameters:

--model large --language Japanese --device cuda --task translate --output_format srt

Far as spec is concerned the only relevant one is the GPU I run it on, that being an RTX 4080

>> No.45120884

>>45120476
I would imagine just downloading best quality audio is sufficient?
Also, I did roughly this and got okay results, maybe a bit worse than what you said in places. Did not force driver=cuda but that should be the default if it's available based on the code. RTX 2070

>> No.45120929

>>45120884
oh but I only did medium due to VRAM issues which might explain it. I'll do a trial on faster-whisper large when I get it to work later.

>> No.45120981

>>45120884
Yeah using the -x option in youtube dl would be sufficient, dunno why I wasn't doing that - that still downloads the video though, it just automates the step I was doing with ffmpeg.

>> No.45121360

>>45120981
idk if that's necessary at all
https://github.com/openai/whisper/discussions/41#discussioncomment-3713140

>> No.45121448

>>45121360
I just do it to save HDD space when I store archives to try with different parameters, not because there's any functional advantage to it.

>> No.45122685

>>45121448
have you used faster-whisper yet? I don't see a native cmd util for it annoyingly and was putting off doing it myself

>> No.45122843

>>45122685
Faster whisper is as the name implies, faster whisper. Not much variance there but I do find its a tad less accurate than the stock implementation, albeit much faster and with far less memory usage

>> No.45123081

>>45122843
How do you use it from cmd? Can only do it in stream-translator, need to copy some code or smth

>> No.45123995

>>45123081
I just used a disposable python script then dumped it when I didn't see much benefit over the stock whisper model for my purposes, faster whisper doesn't come with its own command line tools.

>> No.45124498

>>45123995
I'll make one myself then. Sadly my VRAM is not big enough to support large models without faster-whisper.

>> No.45126698

>>45061611
This is amazing anon, thank you so much for your work, managed to get the pego stream to work and its really impresive. What are you gonna do with this? I feel a program or extension like what LiveTL did would be super popular (but i know jackshit about this kind of stuff so its probably super hard to make lol)

anyways, nice work!

>> No.45127174

>>45126698
Could potentially make a centralised repository of AI generated subs for holo streams or do AI threads where I (and other folks with capable hardware) pump em out on request.

>> No.45127652

>>45127174
I would eternally kneel to you, anon. I honestly cant believe we are at this point with technology where this is possible. I wonder if live translating is possible in the future.

>> No.45127883

>>45126698
>>45127652
(other anon)
For an extension, it would be somewhat problematic since it would need to run locally on your GPU, so you'd need to do a bunch of setup listed in "rentry co live-tl" anyway, and it would just be calling a script from Chrome. Not sure how much benefit there is, and I'm personally not familiar with coding browser extensions.
For repositories, it seems more manageable. Probably just have something like stream links/metadata alongside the .srt/text files generated along with some data like build/parameters used.
Plus you can follow a model similar to how they distribute anime subtitle files.

>> No.45127955

>>45127883
We could potentially have an autoloader for pre-generated community accepted subs from said central repository, maybe even slightly tweaked and manually edited ones, that would be a bit more feasible as an extension.

>> No.45128035

>>45127883 (me)
to get around the GPU+setup issue, you could have a single person computing it for everyone.
Then obviously you could piggyback off of LiveTL by using its functionality and actually sending automated messages in chat, but I don't think anyone would appreciate that and it would be shut down. But maybe a third-party messaging system that you can opt-in to? IDK

>> No.45128066

you guys think an AMD RX6600 could run this without being painfully slow? i'm kicking myself for buying an amd when I had the chance to get a 3060...

>> No.45128089

>>45127955
I was thinking about extensions for live streams in particular as opposed to VODs, but you're right about that. What you're saying is basically just re-implementing youtube community subs in a sense, and piping in AI-generated stuff, right?

>> No.45128263

>>45128066
I think using faster-whisper and maybe only a medium sized model you should be able to achieve live translation. For VODs, faster-whisper and large should be a reasonable pace but not super fast. (I'm just basing this off of online comparison metrics for GPUs.)

The bigger annoyance is setup for AMD systems. My understanding is that pytorch with AMD GPUs only currently exists for linux and you have to do some non-standard build processes:
https://github.com/openai/whisper/discussions/105

>> No.45128296

>>45128089
Yeah essentially, livestreams are a bit of a tough nut to crack but VODS are a lot more do-able

>> No.45128398

>>45128035
Yeah, livestreams are a tough nut to crack but VODs are a lot more doable.

Actually, the extension mentioned at the top of this thread has support for subtitle searching using OpenSubtitles.org and Amara.org, so the hard work may be done for us there

>> No.45128456

>>45128263
Problem is you get worse translation results with the medium model in my opinion, its fine for transcription but there's less room to manoeuvre for the kind of guesswork needed to translate when you have less parameters like that.

>> No.45128545

>>45128035
That person would need a hell of a workstation to make that possible, we're talking enterprise level shit or paid cloud computing

>> No.45128648

>>45128456
you can try with the large, idk if it'll be quite fast enough.
>>45128545
i'm imagining people can sign up, not just 1 for all.
plus, I'm pretty sure modern GPU systems could support a few streams live simultaneously, maybe using faster-whisper. There's only so many holoJPs.

>> No.45129099

I'll be out for a bit but hoping to come back with some further documentation

>> No.45129729

Seems to me that: https://chrome.google.com/webstore/detail/subtitles-for-youtube/oanhbddbfkjaphdibnebkklpplclomal?hl=en
In conjunction with: https://amara.org/

As a source for searchable community driven AI generated and manually editable subs may be our best bet here, we could make a bespoke repo and extension but really all the works already been done for us between those two.

>> No.45130834

Would a 3060 be good enough to handle just translating from downloaded stream VODs?

>> No.45131310

>>45130834
The 3060 12gb is one of the few cards on the low-mid end capable of running the large model owing to its VRAM capacity so yeah totally, though it will be considerably slower than higher end options it can definitely handle it.

>> No.45131711

>>45131310
what about the 3060ti? I was thinking of buying that one but if the 3060 can handle this better than the ti version I dont see much reason to get it, specially when the 3060 its cheaper where i live lol

>> No.45132639

>>45131711
The 3060ti would be unable to handle the large model when it comes to regular whisper, though it could do faster whisper. That being said the TI is leagues better than the 3060 for gaming, so go with that if that's your priority (which I presume it is).

>> No.45132998

>>45132639
Some gaming yeah, but I also wanted to check out Stable Diffusion. Could be useful for reference images for drawing.

>> No.45133776

>>45132998
If productivity is a concern the 3060 12GB would ironically be the better pick owing to its greater memory capacity, just be aware it isn't really suitable for gaming above 1080p these days, and ray tracing is out of the question.

>> No.45136105

>>45132998
get the 12gig version of 3060. TI only has 8 which is going to be a bottleneck/limit for all sorts of large models unless someone has created efficient versions

>> No.45137575

>>45129729
amara seems a bit corpo to me, although it would definitely work for publishing subs. trying to estimate how much data subs in en+ja would require when summed over all holo vtubers over all time, and the viability of alternative storage/distribution

>> No.45138776

>>45137575
Holo is a corpo so I think its a bit of a moot point here really

>> No.45141995

edit coming soon

>> No.45142479

>>45097660
No no Fubuki is furendo!

>> No.45142565

You madlads

>> No.45143370

Updated
>rentry co live-tl
with VOD translation instructions. Maybe will add the following at some point:
>VOD translation with fast-whisper (only useful if you really need efficiency. e.g. if you do not have enough VRAM for large models)
>modifying whisper so that it saves partial data along the way, in case you need to terminate or restart or whatever

>> No.45144076

>>45143370
Doing gods work anon o7

>> No.45144346

>>45141995
nice work anon. some ideas if you want 'em:
might want to clarify a bit on the python versions, I initially tried with 3.12 and pip couldn't resolve dependencies for faster_whipser. needed to downgrade to 3.10.9 to get it to work (needed <3.11 according to pip, don't remember which dep). could just be something weird in my setup or overly cautious requirements in the deps if you have it working tho.
also might want to add a bit about pulling dependencies with winget, it's available by default in W11 and should be around in W10 v1809+. got everything from there except cuDNN.

>> No.45144588

Im stupid and dumb, whats rentry co live-tl?

>> No.45144862

can someone else check the faster-whisper code for stream-translator and give a second opinion on whether it is (currently) compatible with history_buffer_size or not? My reading tells me their code doesn't use history for faster-whisper but I can't fully understand what arguments are being passed to model.transcribe
>>45144346
Another anon had issues pulling the right version of CUDA using winget. Anyway I didn't use it but can add a sentence that one can use it near the beginning. If you have any usage tips drop them here and I can add it.
I'll add the Python warning
>>45144588
put a dot between rentry and co and a slash between co and live-tl then put it in your browser. 4chan thinks it's spam sometimes hence the weird format

>> No.45146169

>>45144862
I feel like line 143 of translator.py should be
>if not faster_whisper_args
and otherwise the code should throw errors. Probably being dumb.

>> No.45147384

>>45144862
heh same anon actually, yeah nvidia doesn't have tags set up the same way as the other repos but it was still there.
you can install all the needed deps for the basic version by running these commands:
> winget install Git.Git
> winget install Python.Python.3.10
> winget install Gyan.FFmpeg
> winget install Nvidia.CUDA --version 11.8
things you MIGHT still need to do manually (winget usually handles adding stuff to path automatically but for me ffmpeg didn't work) :
> add the ffmpeg bin folder to PATH (as of current version it will be '%LOCALAPPDATA%\Microsoft\WinGet\Packages\Gyan.FFmpeg_Microsoft.Winget.Source_8wekyb3d8bbwe\ffmpeg-6.0-full_build\bin', if they update just find the new release in the packages folder). could probably do this as a command with something like 'setx path "%PATH%;%LOCALAPPDATA%\Microsoft\WinGet\Packages\Gyan.FFmpeg_Microsoft.Winget.Source_8wekyb3d8bbwe\ffmpeg-6.0-full_build\bin' but this is risky without knowing everything going on with your system's path, probably best to just do it from the GUI if winget didn't do it for you.
and finally for faster_whipser:
> download 'cuDNN v8.8.0 (February 7th, 2023), for CUDA 11.x' and extract contents of libs, include, and bin to the corresponding folders in '%PROGRAMFILES%\NVIDIA GPU Computing Toolkit\CUDA\v11.8'
> download 'Zlib 1.2.3' and extract zlibwapi.dll to '%PROGRAMFILES%\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin'
after that you should be good to go starting with item 2 in the rentry

>> No.45149571

>>45144862 (me)
I think faster-whisper for stream-translator doesn't use history but I don't see a good way to do it so I'm not going to mess with internals idk about
>>45147384
thanks will add

>> No.45152823

adding faster-whisper VOD soon but rebuilding some stuff

>> No.45154856

>>45152823
updated! plus some extra debugging techniques like cudnn recognition

>> No.45156094

>>45154856
gonna sign off. again, include "rentrylivetl" in a post complaining about the instruction document.
maybe will think about mass translation reposities and live stream coverage etc. as discussed with anons at a later date

>> No.45157263
File: 260 KB, 1424x1768, 1670683285713112.jpg [View same] [iqdb] [saucenao] [google]
45157263

>> No.45160054
File: 276 KB, 1305x2048, 1657230282199987.jpg [View same] [iqdb] [saucenao] [google]
45160054

Good thread don't die

>> No.45161104

>>45160054
do you have any content you want to see?

>> No.45163301

>>45160054
this, finally some good shit on the catalog

>> No.45165353

Bump

>> No.45167331

Do people still use ytarchive to download livestreams or is yt-dlp now better?

>> No.45167610

>>45167331
yt-dlp is finicky if you don't initiate at the beginning of the stream but want to download it live. --live-from-start did some weird stuff as of some months ago, haven't updated and tried the newest stuff since I don't need to archive live stuff halfway through often
In all other respects though, yt-dlp is superior.

>> No.45167959

>>45156094
hey anon, I've been playing with the source to try to fix the hanging problem I saw.
to recap the issue is the script hanging indefinitely when the stream ends or a single KeyboardInterrupt is received. have to spam interrupts to cause a hard exit in order to quit. not a huge deal really but will be an issue probably if there is ever an effort to crowdsource translations with daemons.
beware this could be a brainlet take, so if I'm dumb feel free to tell me or ignore
I think the KeyboardInterrupt or stream ending kills the streamlink process, which closes but leaves the ffmpeg process awaiting input; then the read for in_bytes in the main loop blocks waiting for bytes. everything sits until we spam enough interrupts to kill the ffmpeg process, then the model finally shuts down and it quits. might be a windows-specific problem since some of the process signaling stuff is missing there.
to fix, I edited your writer function to make sure the pipe gets closed down when either process dies. then everything works as expected and it shuts down within a couple seconds instead of hanging indefinitely. also added an except block so you don't get a traceback when quitting via keyboard interrupt. git diff -p:
> diff --git a/translator.py b/translator.py
> index 95247e6..d291b0b 100644
> --- a/translator.py
> +++ b/translator.py
> @@ -85,6 +85,8 @@ def open_stream(stream, direct_url, preferred_quality):
> ffmpeg_proc.stdin.write(chunk)
> except (BrokenPipeError, OSError):
> pass
> + if not ffmpeg_proc.stdin.closed:
> + ffmpeg_proc.stdin.close()
>
> cmd = ['streamlink', stream, option, "-O"]
> streamlink_process = subprocess.Popen(cmd, stdout=subprocess.PIPE)
>@@ -182,6 +184,8 @@ def main(url, model="small", language=None, interval=5, history_buffer_size=0, p
> print(f'{datetime.now().strftime("%H:%M:%S")} {decoded_language} {decoded_text}')
>
> print("Stream ended")
>+ except KeyboardInterrupt:
>+ print("Quitting from interrupt")
> finally:
> ffmpeg_process.kill()
> if streamlink_process:
I can PR it if you want but it's super tiny, you might have your own way you want to do this, or I might just be crazy and the only one seeing this behavior.
thanks again for the fantastic script. def post if you wanna get a crowdsourced liveTL system going. if you're sticking with python, might be cool to handle the backend with fastapi + starlette websockets for vods / live.

>> No.45168691

>>45167959
it's not my code. I figured that it was something to do with the streamlink though just based on how it behaved. You can definitely pull request the dude if you want. I think there are other poor choices in the code but I'm not a dev so not really sure.

>> No.45171409

>>45168691
Yeah this thing needs some serious tweaking to really be viable

>> No.45173209

oi not yet

>> No.45176705

Bump

>> No.45179478

>>45061611
>>And if you don't want to download the VOD and apply it manually you could use this extension or something like it to apply the subs directly to the youtube VOD:
thanks

>> No.45179512

bump

>> No.45180169

Oh. I've been doing this a while with downloaded streams. The easiest ones to translate are ASMR, which can also be funny imo. I'll have to look at the live stuff, but I'm happy with vod's usually. Hope you get it working well!

>> No.45183307

>>45180169
it works OK. I don't have specs to run the large non-efficientized version though so what other anons might say is more reliable.
also, do you have any wrappers/usedul utils or are you just directly applying whisper's native functions?

>> No.45184827

>>45183307
I'm not doing anything special, just a basic script to automate the process of running things.

>>
Name
E-mail
Subject
Comment
Action