[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/jp/ - Otaku Culture


View post   

File: 82 KB, 500x500, R-6359152-1417293528-2126.jpeg.jpg [View same] [iqdb] [saucenao] [google]
20745649 No.20745649 [Reply] [Original]

Before it got 404'd I regularly dumped its albums using a Python script. I missed out on C95 however, as I was busy at that time.

This isn't a database dump; I just abused their weird "sharebox" feature.

https://my.mixtape.moe/qssxyp.json

>> No.20745676

Also, I understand that some links may have rotted by then, and that most of these albums are already elsewhere but I posted it just because.

>> No.20746781
File: 171 KB, 1280x1024, 1493285623315.jpg [View same] [iqdb] [saucenao] [google]
20746781

>>20745649
What happened to them anyways, I just recently found out the site was down.

>> No.20746888

>>20745649
>>20745676
Thanks so much bro. Bur ironically your link is 404 as well.

>>20746781
This is probably just like half the story or whatever but
-guy who ran the site had his wordpress account baleeted, not sure why but maybe because some people made copyright claims and wordpress saw that it was just a piracy site
-folks were mad that albums links now redirected to those intersitial ads that generate ad revenue and try to give you cool viruses, especially since now the bulk of the links came from other websites. Before that dojin would ask for donations and sometimes lock the site to unregistered users around the time of each comiket for a couple of weeks.
-A couple of months ago someone registered & logged onto the site and had an autistic meltdown and started deleting dozens of links while uploading like soundcloud rap and mainstream pop music albums

Anyways apparently someone also archived the links onto a discord server, and there are also the other doujin music sites

>> No.20749156

>>20746888
Trips checked.

Also, apologies, went 200 when I just uploaded.

Reupload: https://files.catbox.moe/mrfc9f.json

>> No.20749358

Thanks dude.
Even if it was still up, this is easier to search through than the actual site.

>> No.20749731

>>20749358
>easier to search through than the actual site.
top kek

Let me tell you how badly designed the site is, while I'm here

It's essentially WordPress hosted on wp-engine, with a custom theme (single-music) for its custom post type: "music".

There is one (1) API endpoint (exploreLoad) for fetching all album IDs; not paginated, in whole, as one big JSON object

If they really wanted to implement the point purchasing system much like a respectable topsite (3:1 upload ratio), then they would have to rework the entire site (and maybe move away from WordPress), because by default, posts, even custom types like the albums you saw, are open through the WordPress API to anyone with an account.

I was too lazy to setup an account for a token, so I literally got the list of albums, batched it up because of HTTP URL limits, and just parsed the sharebox response.

Some fields (i.e. artists like cosMo@Bousoup) were incorrectly identified by CloudFlare as an email address and were encrypted, so I had to borrow somebody else's code to decode it

There was one album I had to skip entirely because one of the fields broke my HTML parser due to the way the uploader linked it. They can't even validate for XSS properly.

>> No.20749736

>>20749731
It was fun to reverse-engineer the entire thing anyways. If you have a similar site I could scrape, that'd be fun.

>> No.20749742
File: 44 KB, 500x375, 1400700391139.jpg [View same] [iqdb] [saucenao] [google]
20749742

>>20749731

>> No.20749744

>>20749731
>It's essentially WordPress hosted on wp-engine, with a custom theme (single-music) for its custom post type: "music".
Oh, forgot to mention; it wasn't a "normal" WordPress theme, where the posts were rendered server-side. It was literally a static page with JS client-side rendering, fetching from a custom API (admin-ajax.php, the same endpoint for exploreLoad lmao) no caching or anything.

If you were wondering why the site was slow when you browsed it, that's why.

>> No.20749926

>>20746888
>-folks were mad that albums links now redirected to those intersitial ads that generate ad revenue and try to give you cool viruses,
was it ouo.io? I got into that site a lot of times when I was downloading my albums before the 404.
Good thing I have a Linux distro.

>> No.20750086

>>20749926
Seems to be. I personally never used the site when all this went down, but it's what 9Tensu uses.

>> No.20752118

Fuck
Whats the alternative now?

>> No.20752909

>>20750086
Well, what I can say at least is that the only thing I did was to solve the captcha and that's it.
I used ublock too, so there weren't any ads and eventually got the link for mega and only 2 times I downloaded it directly from ouo.io

>>20752118
Bumping for this too.
Or I guess torrents should do the work.

>> No.20753767

>>20752118
>>20752909
unironically discord servers
dojinco subreddit has some in posts and comments

>> No.20753777

>>20753767
The absolute state of this site.

>> No.20753793

>>20753777
it really do be like that, im afraid

>> No.20753815

>>20753767
Well shit, I thought /jp/ would be free from the discord cancer but I guess not
What an absolutely depressing state of affairs, discord is destroying so many sites and its a chinese spy tool on top, if you use it I hope you die a violent death.

>>20753793
Id sooner just go without the goods then, seriously fuck discord why does everything keep getting worse and worse

>> No.20754050

>>20749736
If you could scrape 9Tensu with the actual links you'd be a god, but I'm not sure how hard that'd be.

>>20752118
vk/doujinmusic isn't too bad.

There's also torrents, google, and /jp/ if vk doesn't work out.

>> No.20754422

>>20754050
>scrape 9Tensu
That is easy, parse HTML content in the Atom feed of the Blogger site
>actual links
How you break interstitial links? No clue; apologies. But I can sure as hell scrape for those and try to figure something out.

>> No.20756194

>>20754050
that russian seems to be salty about other sites
Why all this retarded drama who cares if people takes links from others? Its all about sharing, and not having to use chinkscord

>> No.20756257

>>20746781
>>20746888
Long story short, the owner kept reposting the links to albums that other people ripped and posted on discord. It'd be fine if he didn't use the rips generously uploaded by others to earn money through ads on his site without giving them any credit as well.

The guys over at discord naturally were angry, so they put warnings against dojin.co in their archives. Since the retard owner didn't even check them, people who downloaded the stuff through that site found out about his being a jew.

Then he hired a mod for $5/month to look for the site... and refused to pay him thereafter. So the mod hacked the site with a simple SQL injection and became its admin for a short while, deleting its contents and replacing the index page with a warning against the site and a link to the original uploading place. It lasted less than a day, but a few thousand people have seen it nevertheless, with hundreds of them becoming the members of that discord server.

With general awareness about his misbehavior increased and the site's content down, the owner tried to squeeze a bit more buck out of it by ads and please-read-a-personal-appeal-for-donations before probably giving up and shutting it down.

Moral of the story? Don't be dishonest.

>> No.20756282

>>20756257
But he was offering a service to use people that have a brain and abhor discord
>naturally were angry
There is nothing natural about that, why be angry that stuff that needs to be shared is getting shared? They are fucking cunts
The moral is that we are down a great site because some cunts dont want to share shit without me having to use a chinenese spybot

>> No.20756316

>>20756282
Let's say you are living in a shared house. You raise some lemons, harvest them, and make delicious lemonade. You put the lemonade bottles in the fridge and lie down for a nap. When you open the fridge a few hours later, the lemonade isn't there. Puzzled, you look out the window and see a guy you're sharing the house with selling the lemonade you made. When you approach him and say, "You know, I made that stuff for us all. I can always make more, sure, just share the revenue if you're going to sell it," he tells you to fuck off.

You'd suck it up and silently fuck off, won't you? Your very own fault for putting the lemonade in the fridge that everyone can access. Right?

>> No.20756333

>>20756316
All I know is that I used to have a great and very convenient place to get stuff and now its gone

Also I woulndt get mad in that scenario If I never had monetary intentions to begin with, so who cares if someone else does? Good for him Im doing it to share the goods
They are cunts period and also retards for using shitscord

>> No.20756458 [DELETED] 
File: 114 KB, 1322x688, wow.jpg [View same] [iqdb] [saucenao] [google]
20756458

>>20756333
Dem trips... So this is the power of the big cuck..! You are too powerful, I can't fight you.

>> No.20756461

>>20756458
You already lost to the chinese government cuckmeister

>> No.20758294

>>20746781
Comiket police busted their place.

>> No.20758456

>>20749926
I got an ouo.io link when I downloaded something right before the website went down
Someone here bullied me for being "one of those paranoid noscript people" and I feel validated now

>> No.20762450

>>20758456
Ah.
Shit.

>> No.20762456
File: 2.56 MB, 2000x910, 1548785537506.png [View same] [iqdb] [saucenao] [google]
20762456

>>20762450
And no, I'm not the one who bullied you or anything, I don't use /g/ that much.
But I guess I will be fine since I always used ublock, I think.
Need to do some reseach on this.

Pic unrelated but funny.

>> No.20762808

>>20754422
https://github.com/adsbypasser/adsbypasser
Looking at this might help you figure it out.

>>20754050
To add to this, #comiket on rizon is also a good source for the latest stuff as long as you know your way around irc.

>> No.20762951

Pythanon here.

I scraped 9tensu, but I didn't parse the post content yet; it's too irregular to parse for download links (earlier posts don't match up with later posts).

https://files.catbox.moe/nfc3p7.json

If anyone can manage to parse for actual interstitial links, and even better, break them, that would be greatly appreciated.

If you wish to scrape 9tensu like I did, I fetched its feed (Blogger API v1) with max-results being set to 150. The results follow a schema called OpenSearch; the fields you should be interested in are opensearch:startindex and opensearch:totalresults. When the former exceeds the latter, is when you should stop scraping.

After that, the response also should have a link tag, of rel="next", which is your next link you should fetch and then parse. Put that in a loop until the condition I provided is met.

>> No.20765602

bump

>> No.20769185

>trying to make money from pirated content
>please do not reupload!
why is doujin community filled with utter faggots and jews?

>> No.20771667

>>20769185
Because there has been some retards that actually pay for pirated content.
Just like the guys that pay that hoe that gets like 150k a year or a million, I don't remember the number nor her.

>> No.20773816
File: 34 KB, 140x240, suika.gif [View same] [iqdb] [saucenao] [google]
20773816

Bumperoni.

>> No.20774446

>>20762951
I've extracted the links from the HTML with https://stedolan.github.io/jq/ and a simple regular expression. Not a best practice, but it worked. I leave removing irrelevant links and breaking the interstitials up to other anons. My script is in the archive.
https://my.mixtape.moe/jkxyxr.zip
https://files.catbox.moe/zp2ke4.zip (mirror)

>> No.20774676

OP (Pythanon) here again.

It seems **some** of those links (mostly adf.ly) have the destination within the url after shortening, but that's after some autistic redirection that simple shit like cURL can't easily handle.

Also, for posterity's sake:

dojin.py: https://ghostbin.com/paste/hduyq (inb4 Mac User-Agent)

kyuutensu.py: https://ghostbin.com/paste/afocr People like >>20774446 should be interested in the latter script; no parsing the JSON dump. :^)

Also, feel free to remove the print calls, I was a bit paranoid on stuck progress.

>> No.20781657

>>20762808
>To add to this, #comiket on rizon is also a good source for the latest stuff as long as you know your way around irc.
Do they share stuff from m3, reitaisai, etc. too?

>> No.20784856

>>20781657
There might be stuff here and there but it's mostly focused on comiket.

>> No.20786238

recommend me some song with vocals before the archive dies.

>> No.20788193

I thought I had moved on, but now I miss old doujinstyle again.

>> No.20791945

>>20786238
Sandoman - Enkaya

>>
Name
E-mail
Subject
Comment
Action