[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/vt/ - Virtual Youtubers

Search:


View post   

>> No.73940392 [View]
File: 18 KB, 801x325, L3_8B L2_70B.png [View same] [iqdb] [saucenao] [google]
73940392

Btw Llama3 got released. We only have 8B and 70B, with 8k context currently, though they say they'll release longer context size models and multimodal variants.
These models were trained on 15T tokens. Llama2 was trained on 2B tokens; Mistral possibly on 8T. I assume this shows in how the L3_8B (instruct) model outperforms L2_70B (instruct) model on all these intelligence tests.
Plus, Meta claims that L3_70B has a ~58% win rate against Claude Sonnet.

How useful is this to Vedal? Not much currently i think.
-If he can get 70B to generate text quickly, perhaps the lack of middle-sized models wouldn't be an issue. (AFAIK its the TTS that has the bigger latency)
-The 8B can be quite smart if tuned effectively - so that could be an upgrade even from a 33B
-The 8k context might be too little for Neuro's memory? Apparently that's too little for RAG purposes. I don't know how other memories like vector databases work.
-If we do get a multimodal speech model, then that would be what Vedal said he was waiting for. But we don't have that yet, or if a speech model will be released.

Navigation
View posts[+24][+48][+96]