[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/vt/ - Virtual Youtubers

Search:


View post   

>> No.65752924 [View]
File: 311 KB, 1024x1536, 07600--3039932256-lora fuwamoco4.jpg [View same] [iqdb] [saucenao] [google]
65752924

>>65751526
Paragraph+tags is the best method but it will only be required if I would be finetuning the original sd 1.5. As I will be finetuning NAI instead, which is already trained on tags and then extract a lyco out of the difference I can do just NLP and when somebody adds the lyco to their mix it should be able to interpret both NLP and tags (in theory).
I'm pretty sure the tagging model can be finetuned on a specific task like tagging anime images but just a few samples would not be enough. More like 50k.
It already details the coom parts (wrong, but it does) it's just that it veers off and interprets some things that don't exist or are not happening. The tags are definitely needed as it rarely interprets the character details. It focuses more on the action and composition. Like this: https://files.catbox.moe/i3wt0q.png
That RLHF(DPO) process would be helpful here in determining which captions are close and which are far off to re-do them but I won't find anyone willing to go through 100k images to do that. I wouldn't personally even go through 5k. So maybe employing that PicScore would be a helpful metric. Anyway, enough rambling. 3k images left to tag and then to start finetuning. Hopefully we get some results.

>>65752270
I think it works fine as long as it's not anime/lora trained on tags, it just lacks logic and counting.
In any case, by using the models as diffusers instead of safetensor checkpoints, an llm and an interpreter workflow can be integrated into a model and trained together. The problem is it will have high vram costs and to reach the speed of dall-e or whatever you would need to spend 10k on a gpu.

Navigation
View posts[+24][+48][+96]