[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math


View post   

File: 33 KB, 729x708, 61919148-8CED-43AC-889A-29F2C2821612.png [View same] [iqdb] [saucenao] [google]
11383868 No.11383868 [Reply] [Original]

What are some books for understanding neural networks that have a decent amount math in them? I’ve watched videos, hence have a fair idea about them. I wanted one for self learning.

Thanks in advance

>> No.11383894

>>11383868
bumperino

>> No.11383897
File: 860 KB, 1500x844, 77953291_p0.png [View same] [iqdb] [saucenao] [google]
11383897

>>11383868
imagine needing a book to understand neural networks

>> No.11383942

>>11383897
Based. But I would be happy if your answer my question

>> No.11384144

>>11383942
*answered
Also, there's a period missing at the end.

>> No.11384160
File: 10 KB, 344x214, fcharul1.png [View same] [iqdb] [saucenao] [google]
11384160

>>11383868
>11383868

You dont need any. This is all you need to know.

>> No.11384529

>>11384144
I was speaking in the present tense, meaning I would be happy if he tells me right now

>> No.11384530

>>11384160
Based but I thought csfags would’ve helped me.

>> No.11384535

>>11383897
This.
>>11383868
Deeplearningbook.
There's no real math in plain NN. The math comes after the actual thing, not before. In that case there are basically no books except bubek's and you should read papers. I'm sick and tired of you making these shit threads by the way. How about you actually read something instead of spending all your time on /sci/ making the same thread over and over again?

>> No.11384539

>>11384535
This is my first thread related to neural. I don’t know wtf are you talking about?
Anyways, thanks for the answer.

>> No.11384541

>>11384529
You used "would", indicating conjunctive mood, which is proper since you don't know if Anon would answer. You also forgot a period in this post.

>> No.11384562

>>11384541
I was incognizant whether you were trolling or not. Thanks for ameliorating.

>> No.11384567

>>11384541
I mostly ignore periods when writing informally.

>> No.11384656

Recurrent Neural Networks for Prediction Learning Algorithms, Architectures and Stability (2001)

>> No.11384732

>>11384656
Thank you!

>> No.11384736

>>11384732
do you want to do project with me? Predicting cryptocurrency market with use of ML and technical predictors, sentanal, time series forecasting, stuff like that. Are you in??

>> No.11384744

https://www.goodreads.com/book/show/27982264-convex-optimization

https://www.goodreads.com/book/show/24072897-deep-learning

https://www.goodreads.com/book/show/739791.Reinforcement_Learning

https://www.goodreads.com/book/show/55881.Pattern_Recognition_and_Machine_Learning

>> No.11384953

>>11384736
I don’t want to sound as a liar by not forewarning you that I’m a novice when it comes to machine learning and neural nets. If you want an expert, then I’m a no show. But if you’re trying to learn it yourself I would be happy as it would be a great learning experience for both of us.

>> No.11384967

>>11384953
I thought so. Do you use discord?

>> No.11384972

>>11384967
no

>> No.11385207

>>11384967
I use discord add me hmmmmm#1409

I know a decent amount about deep learning and crypto currencies. Have never tried to do real-time trading ect though.

>> No.11385217

>>11383868
https://vitrifyher.com/neural-network-tutorials-explaining-code/

>> No.11385379

>>11385207
i send you invite, i am shadowwolf

>> No.11385384

>>11385217
why are you nude?

>> No.11385429

>>11384535
From what I remember its just basic calculus with some linear algebra

>> No.11385439

>>11384529
"I would" is present tense now?

>> No.11385461

>>11385429
Yes (plus baby's first stats and probs), except when it comes to the theory that is spun from it (where it is heavily stats, game theory, stochastic processes, optimization theory, random matrix theory, dynamical systems, etc.). But ultimately most of the theory is not very interesting. First, beside nesterov's momentum and few other results (which, such as in the case of nesterov's proof, are still not properly understood oftentime), they lag very far behind anything useful, and second, they are heavily restrained compared to reality (think how almost every result stems from convex optimization when the loss landscape is known to be highly non-convex, the results with the least constraint require local convexity and bounded gradients and value to get anywhere, and their results are not interesting if the bound is very high).
Thankfully it's improving, which is a side-effect of ML hitting a wall (more time to catch up).
Ultimately I hope we move on from gradient-driven learning on convex landscapes some day.

>> No.11385520

deeplearningbook.org

>> No.11385530

>>11385461
There are lots of people in statistical physics working on theory, I don’t know if anything useful will ever come out of it though

>> No.11385548

>>11385530
People with a stat phys background are not a particularly big group out of those who are working on ml theory, but they are represented for sure. So far they haven't come up with much. They're usually the ones who go with the dynamical systems angle, sometimes random matrix theory (see the glass spin <-> ml analogy).

>> No.11385766

>>11385548
>glass spin <-> ml analogy
What exactly is the analogy? I've heard that Hopfield networks are an implementation of some stat phys system, like that?

>> No.11385822

>>11385766
https://arxiv.org/pdf/1003.1129.pdf
https://arxiv.org/pdf/1412.0233.pdf
https://arxiv.org/pdf/1712.09913.pdf
https://arxiv.org/pdf/1810.01075.pdf

This blog might have more digestible compilation of these observations:
https://calculatedcontent.com/2015/03/25/why-does-deep-learning-work/

>> No.11386784

>>11385822
Whoa, that's comprehensive. Thanks, Anon!

>> No.11386812

>>11383868
not exactly a book but a good resource neuralnetworksanddeeplearning.com

>> No.11386866

>>11385822
>https://arxiv.org/pdf/1003.1129.pdf
This doesn't seem kindred to neural nets

>> No.11386867

>>11385461
>But ultimately most of the theory is not very interesting
That's a massive hill to climb!!

>> No.11386868
File: 40 KB, 500x535, IMG_987_9876543.jpg [View same] [iqdb] [saucenao] [google]
11386868

>>11384541
grammar nazi

>> No.11387244

>>11386868
Nah, only an enthusiast. I also think proper writing conveys respect and thoughtfulness; to other Anons' intellect as well as to one's own.

>> No.11387257
File: 2.13 MB, 1172x832, Eternal Universe.png [View same] [iqdb] [saucenao] [google]
11387257

>>11383868

>> No.11387285

>>11386866
It's a necessary "link" in the "chain" that leads to spin glass <-> neural networks. The idea is that you can analyse neural networks in terms of RM, spin glass in terms of RM, see that the results are basically the same. You can also do this analysis more directly of course.

>> No.11387289

>>11386867
By interesting, to be clear, I specifically mean "tells us something useful" or "tells us something we didn't know". Most of it just devolve to "it works better than it should be we don't really know why" repeated ad infinitum through different lenses.

>> No.11387334

>>11387285
thanks for enlightening.

>> No.11387341

>>11387289
Are you claiming that scholars with years of experience, still don't understand what happens on the inside? Is it just a black box? My ascertained knowledge was that neural nets are deterministic and non trivial, meaning you just had to brute force your way through to obtain 'hidden' insights.

>> No.11387348

>>11387341
*Isn't

>> No.11387353

>>11386868
At least I learnt something.

>> No.11387369

>>11387341
>Are you claiming that scholars with years of experience, still don't understand what happens on the inside?
In the sense that we know what we're using to optimize the network and we know how the connections are laid out, obviously no. In the sense that nobody understands why it's working faster than all the known theoretical bounds developed that fit the assumptions we should be restrained by, yes.
So I don't know about calling it a black box (that's more the purview of FUDtards, futurologists and other >>>/x/ retards), but that doesn't mean it makes any (known) sense that it's working at all.

>meaning you just had to brute force your way through to obtain 'hidden' insights.
That's a completely different topic: "how to understand what the network's solution is". For this there are many ways to go about it but it's fairly meaningless. Networks almost never converge to optimal solutions and solutions they converge to are usually very convoluted and pointless to analyse for the most part.
The most useful insight you can get is "there exists a good solution that only relies on X Y Z parts of the input" but even then that's mostly "the entire input in some non-clear way". Because multiple layers interact in complex ways, anything more complicated than logistic regression ends up not giving you clear interaction between variables if you don't use things like attention or structured layers like convolutions.
And none of that explains "how the optimization works" or how come it's faster than search methods or monte carlo algorithms, or WHERE it converges to, etc.

>> No.11387470

>>11387369
>Networks almost never converge to optimal solutions
Given some known data and a specific network, with (for simplicity purposes) known initial weights, are you saying that no optimal solution exists?

A slightly off topic question, is this solution np-complete?

>> No.11387478

>>11383868
Neural networks are fucking trash.

>> No.11387505

>>11387470
>are you saying that no optimal solution exists?
No. That's the point: even when an optimal solution exists, the model in practice almost never reaches it, rather settling for a "good enough" solution.
In fact I have constructed several such example problem-networks when trying to elucidate the difficulty of tasks in peptide identification from mass spectrometry data. Haven't gotten anything insightful from such analysis though. Another instance of this is https://arxiv.org/pdf/1612.00775.pdf
where you can feed a very easy problem to the network, but unless you initialize the weights to the range of classes (i.e. in pseudocode sm_mult = [0, 1, 2, 3]; out = sm(out) * sm_mult; for a 4-class problem) or very close values, the network will never converge to even a remotely viable solution.

>is this solution np-complete?
Non-convex optimization in general is np-complete, yes.

>> No.11387508

>>11387478
True, but they're also the only thing that "works".

>> No.11387563

>>11387505
making this more difficult is the fact that training loss is not even what you really want to optimize

>> No.11387567

>>11387563
Depends on the problem. For ordinal classification, binary crossentropy is in fact consistent. For most cases (including crossentropy for classification), it's true.
Eitherway it means the position of local minima and "pits" in the landscape probably don't align with those in the actual metric you care about for sure.

>> No.11387583

>>11385520
i'm currently reading that

>> No.11387586

>>11387583
after that just read papers, field is moving too fast to actually have up to date and relevant books

>> No.11387592

>>11387586
how do i know that paper is not bullshit, and it for example presents forecasting method that works only on test data?

>> No.11387606

>>11387592
You don't. ML is the worst field regarding reproducibility.

>> No.11387608

>>11387586
No, field got stuck in 2015 and has yet to recover.
Theory is too far behind for comprehensive books to be worthwhile yet.
>>11387592
Start with the book before you try to think about things you clearly don't even have an inkling of an idea about. Stringing random words together does not make you appear smart.

>> No.11387614

>>11387592
try it out. papers in good conferences and/or by well known authors are generally good but still there are some issues with reproducibility

>> No.11387621

>>11387608
not understanding the theory doesn’t prevent the models from being practically useful

>> No.11387638

>>11387614
Rules of thumb:
- Avoid anything from oceania
- Avoid anything with chinese authors
- Be careful about anything published by japanese authors
- Avoid anything published in a journal (regardless of journal)
- Avoid post-2016 NIPS and post-2017 ICLR
- Take anything from spanish-speakers with a grain of salt
- Avoid anything with affiliation to stanford
- Take everything from an industry group with a grain of salt

>> No.11387642

>>11387621
When I say it got stuck in 2015, I mean practice. Everything since 2015 beside neural ODEs has been old ideas "b-but look now it runs on a bigger gpu" rebranding, shameless ripoffs, and tricking reviewers into accepting old papers because there was such a large volume of work before that nobody can tell an idea isn't new without actual work, and reviewers are overburdened and can't put in that work.
The theory isn't stuck, it's advancing very nicely actually, but it's just slow and not getting anything good yet.

>> No.11387684

>>11387642
what about stuff like BERT?

>> No.11387689
File: 30 KB, 800x450, pepelaugh.jpg [View same] [iqdb] [saucenao] [google]
11387689

>>11387478
says the one who'll be thrown in prison by the AI overlords.

>> No.11387694
File: 106 KB, 631x624, pepe.png [View same] [iqdb] [saucenao] [google]
11387694

>>11387638
>Avoid anything with affiliation to stanford
redpill?

>> No.11387724

>>11387684
A perfect example of classic nuML rebranding. Literally nothing new. People were doing the same shit for 2 decades prior and nobody dared call it new or revolutionary. "We trained on a bigger GPU"-tier garbage.

>> No.11387727 [DELETED] 

>>11387505
>the model in practice almost never reaches it
This may seem foolish, but I've noticed that the error generally approaches an asymtote; could we in theory just let it run *forever and therefore find the optimal solution.

If we know the optimal solution, then why not reverse directions and create an algorithm that will give an optimal solution. Can this 'algorithm' ever be generalized? At this point I assume that no such generalized algorithms can be created to solve the plethora of problems since neural nets are the current buzzwords and these hypothetical algorithms might just be too specific.

Changing the subject here a bit, Do you have a PhD in deep learning?

>> No.11387733

>>11387724
Isn't google duplex not revolutionary? I've always had this opinion that there's nothing inherently new or innovative about it, since it uses the same old algorithms invented years ago and which are now basically being milked by almost everyone in the industry.

>> No.11387735

>>11387694
They have a tendency to falsify or fudge results and never come up with actually interesting things. Andrew Ng was their only serious prof but he basically left them hanging with no backup.

>> No.11387740

>>11387727
>could we in theory just let it run *forever and therefore find the optimal solution.
No, because it will just get stuck in a suboptimal "pit" in the loss landscape and never be able to leave it.
However you could use a random sampling method and let it run forever and it will eventually reach an optimal solution. The problem is you don't expect that to happen before the heat death of the universe, assuming modern hardware and then some.

>If we know the optimal solution, then why not reverse directions and create an algorithm that will give an optimal solution.
Because then we don't need an algorithm in the first place. The point is precisely that we don't know what the optimal solution is (but we can tell whether a position is good or bad, and sometimes we can tell whether a position is optimal or not).

>Do you have a PhD in deep learning?
No, only a master's. I'm a PhD student.

>> No.11387742

>>11387505
>the model in practice almost never reaches it
This may seem foolish, but I've noticed that the error generally approaches an asymtote; could we in theory just let it run *forever and therefore find the optimal solution.

If we know the optimal solution, then why not reverse directions and create an algorithm that will give an optimal solution. Can this 'algorithm' ever be generalized? At this point I assume that no such generalized algorithms can be created to solve the plethora of problems since neural nets are the current buzzwords and these hypothetical algorithms might just be too specific.

Changing the subject here a bit, Do you have a PhD in deep learning?


* - longer than usual time. Ex a year instead of say a couple of months.
That also means that I use an activation function that reduces the error asymtotically. I've noticed some oddities in the error graph with tanh instead of sigmoid on a simple 2 layer network.

>> No.11387745

>>11387733
No, it's yet another "we took things people developed 30 years ago and we did a thing that doesn't work with it now give us money" business trick. Even going far below revolutionary, there's nothing even remotely interesting about it.

>> No.11387749

>>11387745
Lack of public awareness

>> No.11387750

>>11387724
>People were doing the same shit for 2 decades prior
do you happen to be a fan of you again shmidhoobuh?

>> No.11387757
File: 43 KB, 705x701, 1567550808837.jpg [View same] [iqdb] [saucenao] [google]
11387757

How does this thread have 70 replies?
Neural networks are literally just using gradient descent to test out the input-output mapping most optimal to your datasets
There's nothing interesting or complex about it

>> No.11387760

>>11387750
I find him to be hilarious, but no. I do respect his ability to guide his students to break new ground though, he's even better than yoshua at it if you compare proportion of students vs proportion of students who have publish significant advances in ML.

>> No.11387764

>>11387740
>suboptimal "pit" in the loss landscape
I imagine this is because the neural net 'sees' in 2D. Could we design a new kind of neural net that 'sees' in 3D, metaphorically speaking? In this way it makes a better decision of optimizing the loss in the true/global minima.

>> No.11387774

>>11387757
says the one who has dipped only his toes but never actually dived deep enough to actually asphyxiate.

>> No.11387806

>>11387764
No, it's because you are using a convex optimization method in a highly nonconvex landscape. Perhaps the following will be useful visualization:
https://www.youtube.com/watch?v=K_gNLpPKdsk
This is a rather simple landscape (also represented in just 3D when actually it's several-million-Ds at minimum) and the optimum is the big hole in the middle. But if you're initialized near the border and your learning rate is small enough, you'll just fall in one of the holes on the left and never be able to exit it because every direction you look, there's a big cliff going up. You'd rather stay comfy right where you are at the bottom of the shithole than climb up.

>In this way it makes a better decision of optimizing the loss in the true/global minima.
If you could evaluate (or at least efficiently and accurately estimate) the real loss landscape, you again wouldn't need to optimize, you could just directly jump to the minimum. Below that level,
Methods such as asgd, adam, ada*, etc. effectively act like rough estimates of 2nd order information which helps with getting out of minima but it's not enough. Other more direct methods like l-bfgs and full-blown newton can use 2nd order information directly but they're much too slow in practice (especially full newton), not to mention that SGD + momentum (let alone adam) works almost the same in most problems in practice.
3rd order methods also should converge even faster and better, but again the amount of compute time (let alone memory) needed just makes them not worth it in almost all practical cases.

>> No.11387869

>>11387760
three prisoners were sentenced to death...

>> No.11387892

>>11387869
t. belgian guy

>> No.11388062

>>11387806
>several-million-Ds at minimum
I thought it was only 2Ds. Thanks for elucidating.

>> No.11388068

>>11388062
In the loss landscape, each free parameter is one dimension. Of course that's hardly something you can see, hence why it's usually PCA'd for rendering purposes.

>> No.11388092

>>11388068
PCA=Principle Component Analysis, if I don't have dementia already?

>> No.11388100
File: 246 KB, 1126x430, loss landscape.png [View same] [iqdb] [saucenao] [google]
11388100

>>11388068
i've never really looked at low dimensional visualizations of loss functions. kind cool you can actually see differences like this

>> No.11388145

>>11388092
Yes
>>11388100
It's cooler in theory than practice. I believe the low-dim reduction is hiding pretty important stuff because in practice skip connections don't work as well as they should if they were really making the loss landscape THAT smooth. Maybe this partial insight will lead somewhere interesting though.

>> No.11388241

>>11387764
unless P=NP, no

>> No.11388746

>>11387642
Based af. Though I have to add that old =/= bad. Spiking neural networks (old) definitely deserve more research compared to the current tensor multiplication shiet.

>> No.11388990

>>11388746
I strongly agree with this view. In the first place, many of the ideas in ML that are now bread and butter are old ideas made possible in light of modern hardware and knowledge, and I believe there's a fairly high chance that some breakthrough method is hiding in older, now abandoned, ideas. A good research direction/filter to try to identify these ideas is probably to look for anything that was thought to not scale, especially anything which was thought to work really well on toy problems but not on real-world problems.

>> No.11389035

>>11383868

IN MY HUMBLE OPINION...

The best way to learn anything is by doing it by hand. you learn calculus by doing calculations, getting a "feel" for the functions, the derivatives: step-by-step

same goes for neural networks. again, this is just my humble opinion, but after having watched tons of videos on them, I found watching videos and "conceptual" explanations to be a lot of mental masturbation and almost useless when it came to actually implementing one and really "getting it."

so, with all that being said, this is a nice little tutorial I've found that walks you through the actual calculations step-by-step for an extremely simple neural network, that you can get just scale yourself to larger ones. which is how i personally prefer to learn since I am also "self-taught"

what is interesting about neural networks is that once you learn how to calculate one by hand you realize they are a lot like kalman filters and what's interesting about kalman filters is that they are really just moving averages.

https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

>> No.11389050

>>11387638
>- Avoid anything with affiliation to stanford

HOLY FUCKING BASED

>> No.11389164

>>11388241
elucidate please?

>> No.11389167

>>11389035
>The best way to learn anything is by doing it by hand. you learn calculus by doing calculations, getting a "feel" for the functions, the derivatives: step-by-step
this

But anon, neural nets just become too complicated and convoluted. Calculating final weights is much harder than say finding the integral of a function

>> No.11389860

>>11389167
You should calculate a few NNs by hand with very small sizes (say 2-3 layers with 5 neurons each or some such). You could also do something like compute the derivative of a spike-and-slab RBM and see how that devolves to everyone's favorite form.
After that just forget about ever manually doing it and use pytorch of course.

>> No.11389861

>>11389164
Non-convex optimization is NP-complete, thus there would be no tractable algorithm that would be able to do what is suggested unless P = NP.

>> No.11389940

If you don't know what
>Perceptions
>Classification and classification boundaries
>LDA
>Linear,multilinear and nonlinear regression
>Kernel, kernel ridge regression
>Dimensionality reduction, PCA, NMF
Are, don't even bother starting with multi layer perceptions, or even worse deep learning and reinforcement learning

>> No.11389944

>>11389940
Perceptrons*

>> No.11389959

>>11389940
Except for classification and classification boundaries, and arguably PCA/dimensionality reduction, completely disagree. It's definitely a set of things you should eventually familiarize yourself with but in no way required before going into dl. By that logic you should add ant colony and particle swarm optimization, interior points method, MILPs, POMDPs, multi-armed bandits, GAs, projection gradient, newton, etc.

>> No.11389965

>>11389959
>Perceptrons
>Not relevant for deep learning
Imagine trying to learn something without trying to understand what it consists of lmao
Saying pca and classification boundaries are relevant but not kernels is also retarded

>> No.11389972

>>11389965
Kernels really aren't relevant.
Perceptron should have been in the list but on second thought it wasn't wrong to exclude it because it has several properties that are opposite of modern DL, so in fact it's not THAT relevant even though it's a clear precursor.

>> No.11389986

>>11389972
Don't say this to my Prof Klaus or you're gonna make him reee lmao

>> No.11390003

so i've seen several people (including a professor) make it seem like it was a massive problem that perceptrons couldnt do XOR and so everyone abandoned the field until people found out that you could use backprop to do MLPs. but this story sounds ridiculous because computing gradients in a MLP is not hard..

>> No.11390004

>>11389986
Klaus-Robert Müller?
He works mostly with SVMs instead of DL which aren't really in the same sphere. In that case kernels are about the MOST relevant thing ever to him, that goes without saying, but more or less the point of deep learning is that, conceptually, you learn the optimal kernel jointly with the rest of the function, thus kernels become completely irrelevant. That's why DL works so well: you remove one more source of human error (kernel selection and tuning).

>> No.11390005

Weird question, but it’s been frequently reinforced that neural nets can map any function. Is it possible to get back the original function given a neural nets, it’s weight and other relevant information?

>> No.11390007

>>11390003
(Classical) perceptrons have a gradient of 0 everywhere because the loss is 0-1. It's common for good ideas to be abandonned because there's a small thing preventing it from working well in practice and we have other tools that are Good Enough (tm) and no evidence the new toy is going to overtake these methods 100 times over. A modern example of this is GANs, which were introduced in 2014 or so, but only became useful many years later, because they were too unstable in their original form.

>> No.11390017

>>11390007
would it not seem obvious to someone to just approximate the 0-1 with a sigmoid? even just during training and then threshold at 0.5 later or something? i assume logistic regression was around at the time...

>> No.11390018

>>11390004
Nah, does quite a bit of deep learning as well, I'm literally taking his course on it

>> No.11390027

>>11390005
Arbitrary NNs cannot model arbitrary functions. You need an NN with infinite depth and 2 neurons per layer, or infinite width and at least 2 layers for this (this is the universal approximation theorems).
The function expressed by a neural network is always (e.g.) sigm(W2 sigm(W1x + b1) + b2)...
What you're asking is basically, "if I give you the sequence 1, 2, 3 can you tell me that the correct function that generated them is fibonacci?" and the answer is "no because many functions fit these numbers correctly, including identity".
Alternatively, you can ask the question: can we extract a function from the trained neural network that has X form? And the answer is "yes, it's quite easy. Just input arbitrary data into the NN (doesn't need to have anything to do with real data) and get the output, then fit that to your X form'd function (say a polynomial or a decision tree)".

>> No.11390033

>>11390018
I don't really know anything about him beside the fact he was a physicist and used SVMs to do a bunch of shit. What's he even got to say about kernels within the realm of DL?

>> No.11390049

>>11390033
He does chemistry and biology stuff with nns now
All of his kernel and svm shit is like 2 decades back

>> No.11390055

>>11390017
Original perceptron was modeled around neuronal structure in the brain which uses spiking to communicate, not an analog signal. To get from original perceptron to modern perceptron, multiple things needed to happen:
- The realization that spike rate in the brain matters more than spiking itself for a time-agnostic simulated system (basically that you can emulate fairly accurately a spiking neuron system over a fixed period of time by the spiking rate over that fixed period of time)
- The realization that spike rate can be emulated by sigmoidal activation (this requires quantifying spike rate and getting the idea of measuring it on a conceptual fixed timescale as the fraction of time where there was spiking)
And arguably (for scalability)
- The realization that SGD is an unbiased estimate of GD.

The problem wasn't that there wasn't a continuous approximation to a binary signal that could be optimized by gradient descent, it's that gradient descent for iterative learning was expensive, and more importantly, that it didn't seem to readily connect to the way animal brains work. That's why things like hopfield nets were more popular for a time for those who were interested in neural networks.

>> No.11390080

>>11390027
Thank you

>> No.11390086

>>11390055
i see... so basically just focusing on it on a model of biological neurons rather than just trying to make a good predictive model

>> No.11390091

>>11390086
This is why vapnik and many mathematicians seethe so hard when anyone mentions deep learning
You don't know shit about what's going on inside them, literally "lol as long as I get good results"

>> No.11390107

>>11390091
the virgin SVM vs the chad DNN

>> No.11390112

>>11390086
Yes, exactly. The NN research at the time was really more in line with philosophical questions of consciousness, strong AI, the meaning of learning, etc. It was believed that taking inspiration from how the brain behave was going to help unlock secrets that would allow us to train a model as you can train a child (a few years after these paradigms became popular, there was a lot of work around curriculum learning for similar reasons, yet in practice nobody uses that today). This was in contrast with other forms of machine learning (matrix inversion, PCA, operational research methods, EM, etc.) that used mathematical tools to extract 'factors of variations' in the data or 'underlying manifolds', or optima, etc.

>> No.11390181

>>11390091
https://www.youtube.com/watch?v=STFcvzoxVw4&t=1381s
the cope

>> No.11390212

>>11390181
Lmao knew what interview that was going to be. BTW there's a part 2 now. However
You can call it cope as much as you want, doesn't make him less wrong. Deep learning papers are just not scientific enough and make too many assumptions, and bend the interpretations of their results in a way where they're always seen in a positive light. Until we understand how NNs actually work and can axiomize or at least describe properly what's going on, it's just a scientific deadend which is going to collapse in on itself when some dude comes along and BTFOs the entire field by showing that it's in actuality inefficient as fuck

>> No.11390230

>>11390212
Nobody thinks that deep learning is perfect but everyone knows it's the best thing that works right now. There are endless theoretical research efforts in how they work but nothing convincing so far. That's math, though, not science. Most DL papers suck because most of everything in every field sucks, not because it's DL. The good papers are in fact good, and it's very rare, unlike in every other field ever, that DL papers twist results, actually.
Describing what's going on is easy and has been done decades ago, that's pointless. The question is not what, which is understood by even children, it's WHY.

>> No.11390233

>>11390212
i mean sure you can criticize the theory, but to just say that everything that deep learning can accomplished just "must have not been hard problems" is ridiculous

>> No.11390239

>>11390230
>The question is not what, which is understood by even children, it's WHY
That's literally what I said
>>11390230
>Most DL papers suck because most of everything in every field sucks, not because it's DL
Wrong, they suck because they are written by a few csfags who don't know shit about math and think they can just tweak some shit and arrive at a new conclusion. Seen enough times papers like these which shouldn't've ever been published
>>11390233
>everything that deep learning can accomplished just "must have not been hard problems"
Where did I say that

>> No.11390240

>>11390239
vapnik said it

>> No.11390244

>>11390239
>That's literally what I said
No, you said
>describe properly what's going on

>Wrong, they suck because they are written by a few csfags who don't know shit about math and think they can just tweak some shit and arrive at a new conclusion.
It's your own problem if you go out of your way to ignore the good papers and only read the shitty ones that are probably on arxiv and not published.

>> No.11390263

>>11390244
>It's your own problem if you go out of your way to ignore the good papers and only read the shitty ones that are probably on arxiv
Stop misinterpreting my claims retard, I never said there are no good papers, just that there are more shitty papers than there should be

>> No.11390270

>>11390263
That's not what you said before, stop moving the goalposts. Also if you really think that, you've never read papers in any other domain, period.

>> No.11391134

>>11390244
He is right that there are disproportionately many bad dl papers by people simply tweaking minor things or rebranding old models. Then there's also the reproducibility crisis, which hit ML papers in general hard.

>> No.11391211

>>11391134
Wrong on both points. If you want a field strewn with bad reprod, see biology, physics or psychology. If you want one with disproportionally bad papers getting published even at top venues like nature and science, see biology, statistics, or physics. Other crap quality fields include natural language and computer vision papers (i.e. those keeping to classical methods instead of ML). If anything, ML papers have had an outstandingly high reproduction rate historically, with a ridiculously strong culture against unpublished code and data. Papers that use proprietary data to publish results virtually automatically get rejected. Those using public data with no public code have an extremely high rejection rate. The exact opposite of every other field.
Tweaking minor things or rebranding old models is becoming increasingly common (and almost no work past 2015 was more original than that) but that's not disproportionate compared to other fields, in fact it's still far ahead of the aforementioned ones.
Since you don't know the first thing about the topic and have never read a single paper, how about you shut the fuck up until you've at least done this most modest of homeworks?

>> No.11391417

>>11391211
Lmao

>> No.11392179

>>11391211
Seething hard. The truth is literally one Google search away.
>ML papers have had an outstandingly high reproduction rate historically, with a ridiculously strong culture against unpublished code and data
Absolute lie. Are you trolling or really this delusional?
Sure, many big papers have github repos, but the vast majority doesn't.

>Since you don't know the first thing about the topic
Hahaha, the seethe. But yeah, these blind accusations already show you're retarded. I'm one and a half year in my PhD and published two pr papers. You are literally delusional.

>> No.11392186

>>11391211
https://mc.ai/is-deep-learning-facing-a-reproducibility-crisis/

>> No.11392239
File: 46 KB, 520x544, knr5ht5Nc217T8AF2-B3tMb-mjVB6yn5msiN0L59Zy8.jpg [View same] [iqdb] [saucenao] [google]
11392239

>>11390230

>everyone knows it's the best thing that works right now.

Is this what they told you after you finish your first DL MOOC course lmao. You clearly have never been in any Kaggle contest.

>Most DL papers suck because most of everything in every field sucks

Imagine being this mad when people point out the only thing you know a bit about is a meme you have to cope with the belief that everything must be memes.

>> No.11392362

>>11392239
That's why csfags who claim you don't need to know much statistics for ml make me seethe so hard. Imagine not knowing what your entire supposed field is based on

>> No.11392856

>>11392179
>>11392239
Yum yum yum delicious butthurt tears.
>>11392186
>recsys
>information retrieval
You just proved me right, that's exactly what I said in my post: fields that don't use ML don't have reproducible results (and then blame ML for it even though the ML parts are the only part that's reproducible lmao).
Your very own blogpost itself, even though it says "all of ML has a reprod crisis reee" then goes on to discuss at length that... non-ml methods are the ones that can't be reproduced, and doesn't give any example of actual ML system with that problem. They also go out of their way to mention this is all about pre-DL stuff, and that's not mentioning the shit I mentioned before and in my post.
Now that you've been thoroughly BTFO, you can head back to >>>/x/ from whence you came. You're literally too retarded to even read a blogpost by retards, as opposed to reading papers and making a judgement call. At least you're not as retarded as the butthurt posters above, yet you are clearly nobody who belongs on /sci/ (while the others don't even belong on 4chan, as they're clearly underageb&).

>> No.11393044

>>11392856

lol you're just caught out of your depth. Since you believe DL is the best thing that works (nobody with real experience in ML actually thinks along this line) you will at best be some incompetent parrot who can only solve exercises from MOOCs. Try real world ML competitions. Also those mudslinging efforts lmao only you are butthurt and retarded, at least the other guys have a coherent argument with corroboration from others including Vapnik himself. It's okay keep doing what you're doing, it's actually beneficial for us.

>> No.11393062

>>11393044
Congrats, you're clinically retarded!

>> No.11393130
File: 65 KB, 1200x514, 35hp79.jpg [View same] [iqdb] [saucenao] [google]
11393130

>>11393062

>your brain on MOOC Deep Learning

Even your insults are just parroting what someone else wrote before lol

>> No.11393135

>>11393130
Imagine being so deeply lobotomized you think kaggle and MOOCs are a representation of reproducibility and paper quality in a research-oriented field. Imagine unironically believing that principles and technologies that has unlocked never-before-seen advances doesn't work while technologies and principles that haven't done anything useful in the past 40 years through no lack of funding or effort are somehow "the future". Wew to that laddies.

>> No.11393155

>>11393135

Well, you probably said so many retarded things that multiple people disagreed with you on different things and now your thinking is muddled and you start confusing them with each other lol

Nobody believes you are doing `research` MOOC parrot.

>> No.11393158

>>11393155
Oof, seethe and cope tranny.
Go dilate somewhere else.

>> No.11393352

>>11393158
>DLfag
>Calls others trannies
If anything your field is gonna have the most trannies faggot

>> No.11393368

>>11393352
>coping intensifies

>> No.11393401

>>11393368
>Coping
I got my comfy data analyst career in research with h index 10, what have you got

>> No.11393439

>>11393401
An h-index an order of magnitude higher than yours and methods that actually work instead of wankery one-off throwaway proprietary scripts evaluated only on proprietary data that can't be reproduced even on that data because "I always sort my variables independently since it gives me a better correlation" is a valid contribution to your """field""".

>> No.11393448

>>11393439
discarded, thanks for the Ted talk

>> No.11393522

>>11392856
kek, you're a special kind of retarded. I pondered some time whether it would be worth responding to an autist like you.
You have never tried to reproduce the results of a paper, yet talk out of your ass as if you were a published author. Pathetic.

>> No.11393620

>>11393522
The irony in this post is so thick, even a knife couldn't cut through it.

>> No.11393650

>>11393620
lmao