[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math

Search:


View post   

>> No.12730331 [View]
File: 118 KB, 1664x760, LayerNorm.png [View same] [iqdb] [saucenao] [google]
12730331

I can't seem to figure out what I'm doing wrong with this calculation. I'm trying to find out how the derivative in a system passes back through a normalization layer, and I'm getting a result saying it isn't at all. This seems impossible considering that Transformer neural nets rely on gradients passing back thorough it to function. I'm pretty sure I've got the formulation of the original Layer Normalization function correct (from here: https://arxiv.org/abs/1607.06450)) so there must be something wrong with my math. Please tell me why I'm a moron and what I'm doing wrong, I've been stuck on this all day. Pardon my poor LaTeX skills.

Navigation
View posts[+24][+48][+96]