[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/vt/ - Virtual Youtubers

Search:


View post   

>> No.75470812 [View]
File: 212 KB, 1170x546, paperfig.png [View same] [iqdb] [saucenao] [google]
75470812

>>75470529
a paper come out a little bit ago showing that the later layers of models (except the very last ones) tend to be more redundant and don't do much. they found out you can erase a lot of these layers, re-tune the model on a relatively small amount of data and the model will be almost as good as it was before
https://arxiv.org/abs/2403.17887

seems to work here, loss is roughly close to where a normal llama-8b would be with this data

Navigation
View posts[+24][+48][+96]