/sci/ - Science & Math

Anonymous Fri Feb 19 18:18:11 2021 No.12730331 [View]
File: 118 KB, 1664x760, LayerNorm.png [View same] [iqdb] [saucenao] [google]

I can't seem to figure out what I'm doing wrong with this calculation. I'm trying to find out how the derivative in a system passes back through a normalization layer, and I'm getting a result saying it isn't at all. This seems impossible considering that Transformer neural nets rely on gradients passing back thorough it to function. I'm pretty sure I've got the formulation of the original Layer Normalization function correct (from here: https://arxiv.org/abs/1607.06450)) so there must be something wrong with my math. Please tell me why I'm a moron and what I'm doing wrong, I've been stuck on this all day. Pardon my poor LaTeX skills.

Advanced search
Text to find
Subject [?]Search by post subject. Leave empty for any.
Username [?]Search for user name. Leave empty for any user name.
Tripcode [?]Search for tripcode. Leave empty for any.
Email [?]Search by email. Leave empty for any.
Filename [?]Search by image filename. Leave empty for any.
From Date [?]Enter what date to start searching from. Format is YYYY-MM-DD
To Date [?]Enter what date to start searching until. Format is YYYY-MM-DD
Image hash
Search in	All Posts OPs Only
Deleted posts	Show all posts Show only deleted posts Only show non-deleted posts
Internal posts	Show all posts Show only internal posts Show only archived posts
Order	New posts first Old posts first
Capcode	All Posts Only by Users Only by Mods Only by Admins Only by Developers
Results	Posts Threads
Action	[ Simple ]

Navigation
View posts	[+24]	[+48]	[+96]

/sci/ - Science & Math

Search: