[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math


View post   

File: 137 KB, 600x453, 1433884702502.jpg [View same] [iqdb] [saucenao] [google]
8491581 No.8491581 [Reply] [Original]

Okay /sci/, I'm giving each of you the opportunity to make a big advancement in the field of machine learning. I started this project off as a short curiosity, and three days later it is taking over my life. I feel like I'm extremely close to solving this, and the possibilities are tantalizing, but I've felt like success was right around the corner for a couple days now. The only way I see out is to dump this project on someone else, in hopes that they can either finish it themselves or give some helpful feedback.

There's this thing in machine learning called "The Vanishing Gradient Problem." I would explain what it is, but if you don't already know then you probably won't be able to help here. I'm trying to get around vanishing gradients by using logical error signals instead of gradient based ones. In other words, I start out at the bottom of the network, compare the output to the ideal output, and then determine whether each of the output neurons should have put out a "bigger" or "smaller" value. Instead of propagating the gradient, I propagate this "bigger" or "smaller" signal up the network.

Not only can I quickly train arbitrarily deep networks this way, but the computation is made orders of magnitude more efficient. Learning AND and OR gates works great. The trouble is that I'm having a hard time achieving nonlinear separability. I'm using a modified step function (f(x) = -1 for x <= 0, and f(x) = 1 for x > 0) so I *ought* to be able to learn XOR since I have a nonlinear activation function, but for some reason it still doesn't work.

Could someone please try to figure this out? You can take all the credit.

Here's the code:
https://gist.github.com/anonymous/e02d15e82975f9aa5b18831a7dff5a56

>> No.8491583

There's more stuff to talk about, but it's too much to type out in one sitting, so please ask questions

>> No.8491646

Bunp for intrest, doing a research prohect learning orentation of object using convnets. Im having serious vanishing gradient issues.

>> No.8492255

Bump

>> No.8492278

>>8491581
>"The Vanishing Gradient Problem."
Already solved by batch normalization.

Bye bye.

>> No.8492289

>>8492278
But that's wrong

>> No.8492327
File: 494 KB, 387x305, 1478673937250.gif [View same] [iqdb] [saucenao] [google]
8492327

>>8491581
This already exists. You are basically training using the sign of the gradient instead of the gradient itself. It's called Resilient Propagation.

Good insight to come up with the idea on your own though.

>>8492278
No... just no... You aren't even wrong, you are just completely confused.

>> No.8492329
File: 24 KB, 380x379, pepeMAGA.jpg [View same] [iqdb] [saucenao] [google]
8492329

>>8491646
Use ReLU

>> No.8492358

>>8492327
So then why isn't everyone using this if it's both much more efficient and doesn't suffer from vanishing gradients?

>> No.8492524

>>8492358
It's not better because you loose so much precision in your backprop

>> No.8492535

Why do machine learning people never bother to study statistics, analysis and numerical methods in depth like they should?

>> No.8492552

>>8492329
Can I use self-referencing ReLU's in place of LSTM's in a recurrent neural network?

>> No.8494376

bamp