[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]

/vt/ - Virtual Youtubers

Search:


View post   

>> No.67496273 [View]
File: 71 KB, 1254x414, Screenshot 140154.png [View same] [iqdb] [saucenao] [google]
67496273

>>67486613
You can see the settings in lora metadata, there's nothing special.
If you are using sd-scripts, only difference from small lora training is that instead of using folders like 1_name, 2_name .. you drop everything in one folder and generate metadata with 'python sd-scripts/finetune/merge_captions_to_metadata.py $image_folder $image_folder"/metadata_cap.json" --caption_extension ".txt" --recursive --full_path'
When launching the training script you add '--in_json "$image_folder/metadata_cap.json'

i run it like this - https://litter.catbox.moe/zjdk8c.png
You'll obviously use different settings for SDXL. You should have '--resolution="1024,1024" --min_bucket_reso=512 --max_bucket_reso=2048', no --bucket_no_upscale and vped/zsnr.

48gb is fine.
For a start, try grabbing 10 or 20 artists and training a regular high dim lora for 10 epochs using prodigy.
If it works - it works. If not - you'll have to experiment with settings and with tagging / tag ordering.

ponyXL was trained on a mix of tags and natural language. Maybe you should add NLP too. They used LLAVA, but you can try better models like https://huggingface.co/01-ai/Yi-VL-6B#why-yi-vl instead
Qwen-VL and cogAgent (picrelated) are good too, but cogAgent is so slow it's unusable

Navigation
View posts[+24][+48][+96]