Good books for Neural Networks

File: 33 KB, 729x708, 61919148-8CED-43AC-889A-29F2C2821612.png [View same] [iqdb] [saucenao] [google]

Good books for Neural Networks Anonymous Thu Feb 13 23:50:35 2020 No.11383868 [Reply] [Original]

What are some books for understanding neural networks that have a decent amount math in them? I’ve watched videos, hence have a fair idea about them. I wanted one for self learning.

Thanks in advance

>>	Anonymous Fri Feb 14 00:12:12 2020 No.11383894 >>11383868 bumperino

>>	Anonymous Fri Feb 14 00:13:42 2020 No.11383897 File: 860 KB, 1500x844, 77953291_p0.png [View same] [iqdb] [saucenao] [google] >>11383868 imagine needing a book to understand neural networks

>>	Anonymous Fri Feb 14 00:46:56 2020 No.11383942 >>11383897 Based. But I would be happy if your answer my question

>>	Anonymous Fri Feb 14 03:19:42 2020 No.11384144 >>11383942 *answered Also, there's a period missing at the end.

>>	Anonymous Fri Feb 14 03:27:56 2020 No.11384160 File: 10 KB, 344x214, fcharul1.png [View same] [iqdb] [saucenao] [google] >>11383868 >11383868 You dont need any. This is all you need to know.

>>	Anonymous Fri Feb 14 08:37:55 2020 No.11384529 >>11384144 I was speaking in the present tense, meaning I would be happy if he tells me right now

>>	Anonymous Fri Feb 14 08:39:28 2020 No.11384530 >>11384160 Based but I thought csfags would’ve helped me.

Anonymous Fri Feb 14 08:42:11 2020 No.11384535

>>11383897
This.
>>11383868
Deeplearningbook.
There's no real math in plain NN. The math comes after the actual thing, not before. In that case there are basically no books except bubek's and you should read papers. I'm sick and tired of you making these shit threads by the way. How about you actually read something instead of spending all your time on /sci/ making the same thread over and over again?

>>	Anonymous Fri Feb 14 08:44:37 2020 No.11384539 >>11384535 This is my first thread related to neural. I don’t know wtf are you talking about? Anyways, thanks for the answer.

>>	Anonymous Fri Feb 14 08:46:54 2020 No.11384541 >>11384529 You used "would", indicating conjunctive mood, which is proper since you don't know if Anon would answer. You also forgot a period in this post.

>>	Anonymous Fri Feb 14 09:01:12 2020 No.11384562 >>11384541 I was incognizant whether you were trolling or not. Thanks for ameliorating.

>>	Anonymous Fri Feb 14 09:03:04 2020 No.11384567 >>11384541 I mostly ignore periods when writing informally.

>>	Anonymous Fri Feb 14 10:09:00 2020 No.11384656 Recurrent Neural Networks for Prediction Learning Algorithms, Architectures and Stability (2001)

>>	Anonymous Fri Feb 14 11:07:18 2020 No.11384732 >>11384656 Thank you!

>>	Anonymous Fri Feb 14 11:10:02 2020 No.11384736 >>11384732 do you want to do project with me? Predicting cryptocurrency market with use of ML and technical predictors, sentanal, time series forecasting, stuff like that. Are you in??

Anonymous Fri Feb 14 11:15:58 2020 No.11384744

https://www.goodreads.com/book/show/27982264-convex-optimization

https://www.goodreads.com/book/show/24072897-deep-learning

https://www.goodreads.com/book/show/739791.Reinforcement_Learning

https://www.goodreads.com/book/show/55881.Pattern_Recognition_and_Machine_Learning

Anonymous Fri Feb 14 12:50:51 2020 No.11384953

>>11384736
I don’t want to sound as a liar by not forewarning you that I’m a novice when it comes to machine learning and neural nets. If you want an expert, then I’m a no show. But if you’re trying to learn it yourself I would be happy as it would be a great learning experience for both of us.

>>	Anonymous Fri Feb 14 12:57:50 2020 No.11384967 >>11384953 I thought so. Do you use discord?

>>	Anonymous Fri Feb 14 13:00:36 2020 No.11384972 >>11384967 no

>>	Anonymous Fri Feb 14 15:31:52 2020 No.11385207 >>11384967 I use discord add me hmmmmm#1409 I know a decent amount about deep learning and crypto currencies. Have never tried to do real-time trading ect though.

>>	Anonymous Fri Feb 14 15:37:42 2020 No.11385217 >>11383868 https://vitrifyher.com/neural-network-tutorials-explaining-code/

>>	Anonymous Fri Feb 14 16:37:56 2020 No.11385379 >>11385207 i send you invite, i am shadowwolf

>>	Anonymous Fri Feb 14 16:39:40 2020 No.11385384 >>11385217 why are you nude?

>>	Anonymous Fri Feb 14 16:55:37 2020 No.11385429 >>11384535 From what I remember its just basic calculus with some linear algebra

>>	Anonymous Fri Feb 14 16:59:21 2020 No.11385439 >>11384529 "I would" is present tense now?

Anonymous Fri Feb 14 17:11:33 2020 No.11385461

>>11385429
Yes (plus baby's first stats and probs), except when it comes to the theory that is spun from it (where it is heavily stats, game theory, stochastic processes, optimization theory, random matrix theory, dynamical systems, etc.). But ultimately most of the theory is not very interesting. First, beside nesterov's momentum and few other results (which, such as in the case of nesterov's proof, are still not properly understood oftentime), they lag very far behind anything useful, and second, they are heavily restrained compared to reality (think how almost every result stems from convex optimization when the loss landscape is known to be highly non-convex, the results with the least constraint require local convexity and bounded gradients and value to get anywhere, and their results are not interesting if the bound is very high).
Thankfully it's improving, which is a side-effect of ML hitting a wall (more time to catch up).
Ultimately I hope we move on from gradient-driven learning on convex landscapes some day.

>>	Anonymous Fri Feb 14 17:39:51 2020 No.11385520 deeplearningbook.org

>>	Anonymous Fri Feb 14 17:42:42 2020 No.11385530 >>11385461 There are lots of people in statistical physics working on theory, I don’t know if anything useful will ever come out of it though

Anonymous Fri Feb 14 17:47:35 2020 No.11385548

>>11385530
People with a stat phys background are not a particularly big group out of those who are working on ml theory, but they are represented for sure. So far they haven't come up with much. They're usually the ones who go with the dynamical systems angle, sometimes random matrix theory (see the glass spin <-> ml analogy).

>>	Anonymous Fri Feb 14 19:25:47 2020 No.11385766 >>11385548 >glass spin <-> ml analogy What exactly is the analogy? I've heard that Hopfield networks are an implementation of some stat phys system, like that?

Anonymous Fri Feb 14 19:44:15 2020 No.11385822

>>11385766
https://arxiv.org/pdf/1003.1129.pdf
https://arxiv.org/pdf/1412.0233.pdf
https://arxiv.org/pdf/1712.09913.pdf
https://arxiv.org/pdf/1810.01075.pdf

This blog might have more digestible compilation of these observations:
https://calculatedcontent.com/2015/03/25/why-does-deep-learning-work/

>>	Anonymous Sat Feb 15 03:35:48 2020 No.11386784 >>11385822 Whoa, that's comprehensive. Thanks, Anon!

>>	Anonymous Sat Feb 15 03:55:51 2020 No.11386812 >>11383868 not exactly a book but a good resource neuralnetworksanddeeplearning.com

>>	Anonymous Sat Feb 15 04:50:51 2020 No.11386866 >>11385822 >https://arxiv.org/pdf/1003.1129.pdf This doesn't seem kindred to neural nets

>>	Anonymous Sat Feb 15 04:52:04 2020 No.11386867 >>11385461 >But ultimately most of the theory is not very interesting That's a massive hill to climb!!

>>	Anonymous Sat Feb 15 04:54:57 2020 No.11386868 File: 40 KB, 500x535, IMG_987_9876543.jpg [View same] [iqdb] [saucenao] [google] >>11384541 grammar nazi

>>	Anonymous Sat Feb 15 09:21:45 2020 No.11387244 >>11386868 Nah, only an enthusiast. I also think proper writing conveys respect and thoughtfulness; to other Anons' intellect as well as to one's own.

>>	Anonymous Sat Feb 15 09:31:40 2020 No.11387257 File: 2.13 MB, 1172x832, Eternal Universe.png [View same] [iqdb] [saucenao] [google] >>11383868

Anonymous Sat Feb 15 09:44:48 2020 No.11387285

>>11386866
It's a necessary "link" in the "chain" that leads to spin glass <-> neural networks. The idea is that you can analyse neural networks in terms of RM, spin glass in terms of RM, see that the results are basically the same. You can also do this analysis more directly of course.

Anonymous Sat Feb 15 09:45:55 2020 No.11387289

>>11386867
By interesting, to be clear, I specifically mean "tells us something useful" or "tells us something we didn't know". Most of it just devolve to "it works better than it should be we don't really know why" repeated ad infinitum through different lenses.

>>	Anonymous Sat Feb 15 10:06:16 2020 No.11387334 >>11387285 thanks for enlightening.

Anonymous Sat Feb 15 10:12:44 2020 No.11387341

>>11387289
Are you claiming that scholars with years of experience, still don't understand what happens on the inside? Is it just a black box? My ascertained knowledge was that neural nets are deterministic and non trivial, meaning you just had to brute force your way through to obtain 'hidden' insights.

>>	Anonymous Sat Feb 15 10:16:10 2020 No.11387348 >>11387341 *Isn't

>>	Anonymous Sat Feb 15 10:18:29 2020 No.11387353 >>11386868 At least I learnt something.

Anonymous Sat Feb 15 10:25:51 2020 No.11387369

>>11387341
>Are you claiming that scholars with years of experience, still don't understand what happens on the inside?
In the sense that we know what we're using to optimize the network and we know how the connections are laid out, obviously no. In the sense that nobody understands why it's working faster than all the known theoretical bounds developed that fit the assumptions we should be restrained by, yes.
So I don't know about calling it a black box (that's more the purview of FUDtards, futurologists and other >>>/x/ retards), but that doesn't mean it makes any (known) sense that it's working at all.

>meaning you just had to brute force your way through to obtain 'hidden' insights.
That's a completely different topic: "how to understand what the network's solution is". For this there are many ways to go about it but it's fairly meaningless. Networks almost never converge to optimal solutions and solutions they converge to are usually very convoluted and pointless to analyse for the most part.
The most useful insight you can get is "there exists a good solution that only relies on X Y Z parts of the input" but even then that's mostly "the entire input in some non-clear way". Because multiple layers interact in complex ways, anything more complicated than logistic regression ends up not giving you clear interaction between variables if you don't use things like attention or structured layers like convolutions.
And none of that explains "how the optimization works" or how come it's faster than search methods or monte carlo algorithms, or WHERE it converges to, etc.

Anonymous Sat Feb 15 11:01:22 2020 No.11387470

>>11387369
>Networks almost never converge to optimal solutions
Given some known data and a specific network, with (for simplicity purposes) known initial weights, are you saying that no optimal solution exists?

A slightly off topic question, is this solution np-complete?

>>	Anonymous Sat Feb 15 11:07:01 2020 No.11387478 >>11383868 Neural networks are fucking trash.

Anonymous Sat Feb 15 11:23:55 2020 No.11387505

>>11387470
>are you saying that no optimal solution exists?
No. That's the point: even when an optimal solution exists, the model in practice almost never reaches it, rather settling for a "good enough" solution.
In fact I have constructed several such example problem-networks when trying to elucidate the difficulty of tasks in peptide identification from mass spectrometry data. Haven't gotten anything insightful from such analysis though. Another instance of this is https://arxiv.org/pdf/1612.00775.pdf
where you can feed a very easy problem to the network, but unless you initialize the weights to the range of classes (i.e. in pseudocode sm_mult = [0, 1, 2, 3]; out = sm(out) * sm_mult; for a 4-class problem) or very close values, the network will never converge to even a remotely viable solution.

>is this solution np-complete?
Non-convex optimization in general is np-complete, yes.

>>	Anonymous Sat Feb 15 11:24:58 2020 No.11387508 >>11387478 True, but they're also the only thing that "works".

>>	Anonymous Sat Feb 15 11:47:50 2020 No.11387563 >>11387505 making this more difficult is the fact that training loss is not even what you really want to optimize

Anonymous Sat Feb 15 11:50:57 2020 No.11387567

>>11387563
Depends on the problem. For ordinal classification, binary crossentropy is in fact consistent. For most cases (including crossentropy for classification), it's true.
Eitherway it means the position of local minima and "pits" in the landscape probably don't align with those in the actual metric you care about for sure.

>>	Anonymous Sat Feb 15 11:58:58 2020 No.11387583 >>11385520 i'm currently reading that

>>	Anonymous Sat Feb 15 12:00:32 2020 No.11387586 >>11387583 after that just read papers, field is moving too fast to actually have up to date and relevant books

>>	Anonymous Sat Feb 15 12:02:23 2020 No.11387592 >>11387586 how do i know that paper is not bullshit, and it for example presents forecasting method that works only on test data?

>>	Anonymous Sat Feb 15 12:10:08 2020 No.11387606 >>11387592 You don't. ML is the worst field regarding reproducibility.

Anonymous Sat Feb 15 12:10:52 2020 No.11387608

>>11387586
No, field got stuck in 2015 and has yet to recover.
Theory is too far behind for comprehensive books to be worthwhile yet.
>>11387592
Start with the book before you try to think about things you clearly don't even have an inkling of an idea about. Stringing random words together does not make you appear smart.

>>	Anonymous Sat Feb 15 12:13:11 2020 No.11387614 >>11387592 try it out. papers in good conferences and/or by well known authors are generally good but still there are some issues with reproducibility

>>	Anonymous Sat Feb 15 12:15:38 2020 No.11387621 >>11387608 not understanding the theory doesn’t prevent the models from being practically useful

Anonymous Sat Feb 15 12:22:09 2020 No.11387638

>>11387614
Rules of thumb:
- Avoid anything from oceania
- Avoid anything with chinese authors
- Be careful about anything published by japanese authors
- Avoid anything published in a journal (regardless of journal)
- Avoid post-2016 NIPS and post-2017 ICLR
- Take anything from spanish-speakers with a grain of salt
- Avoid anything with affiliation to stanford
- Take everything from an industry group with a grain of salt

Anonymous Sat Feb 15 12:24:18 2020 No.11387642

>>11387621
When I say it got stuck in 2015, I mean practice. Everything since 2015 beside neural ODEs has been old ideas "b-but look now it runs on a bigger gpu" rebranding, shameless ripoffs, and tricking reviewers into accepting old papers because there was such a large volume of work before that nobody can tell an idea isn't new without actual work, and reviewers are overburdened and can't put in that work.
The theory isn't stuck, it's advancing very nicely actually, but it's just slow and not getting anything good yet.

>>	Anonymous Sat Feb 15 12:41:57 2020 No.11387684 >>11387642 what about stuff like BERT?

>>	Anonymous Sat Feb 15 12:44:53 2020 No.11387689 File: 30 KB, 800x450, pepelaugh.jpg [View same] [iqdb] [saucenao] [google] >>11387478 says the one who'll be thrown in prison by the AI overlords.

>>	Anonymous Sat Feb 15 12:47:49 2020 No.11387694 File: 106 KB, 631x624, pepe.png [View same] [iqdb] [saucenao] [google] >>11387638 >Avoid anything with affiliation to stanford redpill?

>>	Anonymous Sat Feb 15 13:05:20 2020 No.11387724 >>11387684 A perfect example of classic nuML rebranding. Literally nothing new. People were doing the same shit for 2 decades prior and nobody dared call it new or revolutionary. "We trained on a bigger GPU"-tier garbage.

Anonymous Sat Feb 15 13:05:45 2020 No.11387727 [DELETED]

>>11387505
>the model in practice almost never reaches it
This may seem foolish, but I've noticed that the error generally approaches an asymtote; could we in theory just let it run *forever and therefore find the optimal solution.

If we know the optimal solution, then why not reverse directions and create an algorithm that will give an optimal solution. Can this 'algorithm' ever be generalized? At this point I assume that no such generalized algorithms can be created to solve the plethora of problems since neural nets are the current buzzwords and these hypothetical algorithms might just be too specific.

Changing the subject here a bit, Do you have a PhD in deep learning?

Anonymous Sat Feb 15 13:10:42 2020 No.11387733

>>11387724
Isn't google duplex not revolutionary? I've always had this opinion that there's nothing inherently new or innovative about it, since it uses the same old algorithms invented years ago and which are now basically being milked by almost everyone in the industry.

>>	Anonymous Sat Feb 15 13:12:58 2020 No.11387735 >>11387694 They have a tendency to falsify or fudge results and never come up with actually interesting things. Andrew Ng was their only serious prof but he basically left them hanging with no backup.

Anonymous Sat Feb 15 13:16:56 2020 No.11387740

>>11387727
>could we in theory just let it run *forever and therefore find the optimal solution.
No, because it will just get stuck in a suboptimal "pit" in the loss landscape and never be able to leave it.
However you could use a random sampling method and let it run forever and it will eventually reach an optimal solution. The problem is you don't expect that to happen before the heat death of the universe, assuming modern hardware and then some.

>If we know the optimal solution, then why not reverse directions and create an algorithm that will give an optimal solution.
Because then we don't need an algorithm in the first place. The point is precisely that we don't know what the optimal solution is (but we can tell whether a position is good or bad, and sometimes we can tell whether a position is optimal or not).

>Do you have a PhD in deep learning?
No, only a master's. I'm a PhD student.

Anonymous Sat Feb 15 13:18:22 2020 No.11387742

>>11387505
>the model in practice almost never reaches it
This may seem foolish, but I've noticed that the error generally approaches an asymtote; could we in theory just let it run *forever and therefore find the optimal solution.

If we know the optimal solution, then why not reverse directions and create an algorithm that will give an optimal solution. Can this 'algorithm' ever be generalized? At this point I assume that no such generalized algorithms can be created to solve the plethora of problems since neural nets are the current buzzwords and these hypothetical algorithms might just be too specific.

Changing the subject here a bit, Do you have a PhD in deep learning?

* - longer than usual time. Ex a year instead of say a couple of months.
That also means that I use an activation function that reduces the error asymtotically. I've noticed some oddities in the error graph with tanh instead of sigmoid on a simple 2 layer network.

>>	Anonymous Sat Feb 15 13:20:08 2020 No.11387745 >>11387733 No, it's yet another "we took things people developed 30 years ago and we did a thing that doesn't work with it now give us money" business trick. Even going far below revolutionary, there's nothing even remotely interesting about it.

>>	Anonymous Sat Feb 15 13:21:57 2020 No.11387749 >>11387745 Lack of public awareness

>>	Anonymous Sat Feb 15 13:22:21 2020 No.11387750 >>11387724 >People were doing the same shit for 2 decades prior do you happen to be a fan of you again shmidhoobuh?

Anonymous Sat Feb 15 13:26:33 2020 No.11387757
File: 43 KB, 705x701, 1567550808837.jpg [View same] [iqdb] [saucenao] [google]

How does this thread have 70 replies?
Neural networks are literally just using gradient descent to test out the input-output mapping most optimal to your datasets
There's nothing interesting or complex about it

Anonymous Sat Feb 15 13:26:52 2020 No.11387760

>>11387750
I find him to be hilarious, but no. I do respect his ability to guide his students to break new ground though, he's even better than yoshua at it if you compare proportion of students vs proportion of students who have publish significant advances in ML.

Anonymous Sat Feb 15 13:27:36 2020 No.11387764

>>11387740
>suboptimal "pit" in the loss landscape
I imagine this is because the neural net 'sees' in 2D. Could we design a new kind of neural net that 'sees' in 3D, metaphorically speaking? In this way it makes a better decision of optimizing the loss in the true/global minima.

>>	Anonymous Sat Feb 15 13:32:11 2020 No.11387774 >>11387757 says the one who has dipped only his toes but never actually dived deep enough to actually asphyxiate.

Anonymous Sat Feb 15 13:45:20 2020 No.11387806

>>11387764
No, it's because you are using a convex optimization method in a highly nonconvex landscape. Perhaps the following will be useful visualization:
https://www.youtube.com/watch?v=K_gNLpPKdsk
This is a rather simple landscape (also represented in just 3D when actually it's several-million-Ds at minimum) and the optimum is the big hole in the middle. But if you're initialized near the border and your learning rate is small enough, you'll just fall in one of the holes on the left and never be able to exit it because every direction you look, there's a big cliff going up. You'd rather stay comfy right where you are at the bottom of the shithole than climb up.

>In this way it makes a better decision of optimizing the loss in the true/global minima.
If you could evaluate (or at least efficiently and accurately estimate) the real loss landscape, you again wouldn't need to optimize, you could just directly jump to the minimum. Below that level,
Methods such as asgd, adam, ada*, etc. effectively act like rough estimates of 2nd order information which helps with getting out of minima but it's not enough. Other more direct methods like l-bfgs and full-blown newton can use 2nd order information directly but they're much too slow in practice (especially full newton), not to mention that SGD + momentum (let alone adam) works almost the same in most problems in practice.
3rd order methods also should converge even faster and better, but again the amount of compute time (let alone memory) needed just makes them not worth it in almost all practical cases.

>>	Anonymous Sat Feb 15 14:10:38 2020 No.11387869 >>11387760 three prisoners were sentenced to death...

>>	Anonymous Sat Feb 15 14:19:57 2020 No.11387892 >>11387869 t. belgian guy

>>	Anonymous Sat Feb 15 15:45:30 2020 No.11388062 >>11387806 >several-million-Ds at minimum I thought it was only 2Ds. Thanks for elucidating.

>>	Anonymous Sat Feb 15 15:48:30 2020 No.11388068 >>11388062 In the loss landscape, each free parameter is one dimension. Of course that's hardly something you can see, hence why it's usually PCA'd for rendering purposes.

>>	Anonymous Sat Feb 15 16:03:18 2020 No.11388092 >>11388068 PCA=Principle Component Analysis, if I don't have dementia already?

>>	Anonymous Sat Feb 15 16:07:35 2020 No.11388100 File: 246 KB, 1126x430, loss landscape.png [View same] [iqdb] [saucenao] [google] >>11388068 i've never really looked at low dimensional visualizations of loss functions. kind cool you can actually see differences like this

Anonymous Sat Feb 15 16:36:54 2020 No.11388145

>>11388092
Yes
>>11388100
It's cooler in theory than practice. I believe the low-dim reduction is hiding pretty important stuff because in practice skip connections don't work as well as they should if they were really making the loss landscape THAT smooth. Maybe this partial insight will lead somewhere interesting though.

>>	Anonymous Sat Feb 15 17:13:30 2020 No.11388241 >>11387764 unless P=NP, no

>>	Anonymous Sat Feb 15 21:18:59 2020 No.11388746 >>11387642 Based af. Though I have to add that old =/= bad. Spiking neural networks (old) definitely deserve more research compared to the current tensor multiplication shiet.

Anonymous Sat Feb 15 23:19:41 2020 No.11388990

>>11388746
I strongly agree with this view. In the first place, many of the ideas in ML that are now bread and butter are old ideas made possible in light of modern hardware and knowledge, and I believe there's a fairly high chance that some breakthrough method is hiding in older, now abandoned, ideas. A good research direction/filter to try to identify these ideas is probably to look for anything that was thought to not scale, especially anything which was thought to work really well on toy problems but not on real-world problems.

Anonymous Sun Feb 16 00:02:40 2020 No.11389035

>>11383868

IN MY HUMBLE OPINION...

The best way to learn anything is by doing it by hand. you learn calculus by doing calculations, getting a "feel" for the functions, the derivatives: step-by-step

same goes for neural networks. again, this is just my humble opinion, but after having watched tons of videos on them, I found watching videos and "conceptual" explanations to be a lot of mental masturbation and almost useless when it came to actually implementing one and really "getting it."

so, with all that being said, this is a nice little tutorial I've found that walks you through the actual calculations step-by-step for an extremely simple neural network, that you can get just scale yourself to larger ones. which is how i personally prefer to learn since I am also "self-taught"

what is interesting about neural networks is that once you learn how to calculate one by hand you realize they are a lot like kalman filters and what's interesting about kalman filters is that they are really just moving averages.

https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

>>	Anonymous Sun Feb 16 00:20:26 2020 No.11389050 >>11387638 >- Avoid anything with affiliation to stanford HOLY FUCKING BASED

>>	Anonymous Sun Feb 16 01:44:04 2020 No.11389164 >>11388241 elucidate please?

Anonymous Sun Feb 16 01:47:49 2020 No.11389167

>>11389035
>The best way to learn anything is by doing it by hand. you learn calculus by doing calculations, getting a "feel" for the functions, the derivatives: step-by-step
this

But anon, neural nets just become too complicated and convoluted. Calculating final weights is much harder than say finding the integral of a function

Anonymous Sun Feb 16 09:31:40 2020 No.11389860

>>11389167
You should calculate a few NNs by hand with very small sizes (say 2-3 layers with 5 neurons each or some such). You could also do something like compute the derivative of a spike-and-slab RBM and see how that devolves to everyone's favorite form.
After that just forget about ever manually doing it and use pytorch of course.

>>	Anonymous Sun Feb 16 09:32:42 2020 No.11389861 >>11389164 Non-convex optimization is NP-complete, thus there would be no tractable algorithm that would be able to do what is suggested unless P = NP.

Anonymous Sun Feb 16 10:10:05 2020 No.11389940

If you don't know what
>Perceptions
>Classification and classification boundaries
>LDA
>Linear,multilinear and nonlinear regression
>Kernel, kernel ridge regression
>Dimensionality reduction, PCA, NMF
Are, don't even bother starting with multi layer perceptions, or even worse deep learning and reinforcement learning

>>	Anonymous Sun Feb 16 10:11:16 2020 No.11389944 >>11389940 Perceptrons*

Anonymous Sun Feb 16 10:18:52 2020 No.11389959

>>11389940
Except for classification and classification boundaries, and arguably PCA/dimensionality reduction, completely disagree. It's definitely a set of things you should eventually familiarize yourself with but in no way required before going into dl. By that logic you should add ant colony and particle swarm optimization, interior points method, MILPs, POMDPs, multi-armed bandits, GAs, projection gradient, newton, etc.

>>	Anonymous Sun Feb 16 10:23:43 2020 No.11389965 >>11389959 >Perceptrons >Not relevant for deep learning Imagine trying to learn something without trying to understand what it consists of lmao Saying pca and classification boundaries are relevant but not kernels is also retarded

Anonymous Sun Feb 16 10:26:23 2020 No.11389972

>>11389965
Kernels really aren't relevant.
Perceptron should have been in the list but on second thought it wasn't wrong to exclude it because it has several properties that are opposite of modern DL, so in fact it's not THAT relevant even though it's a clear precursor.

>>	Anonymous Sun Feb 16 10:33:21 2020 No.11389986 >>11389972 Don't say this to my Prof Klaus or you're gonna make him reee lmao

Anonymous Sun Feb 16 10:39:02 2020 No.11390003

so i've seen several people (including a professor) make it seem like it was a massive problem that perceptrons couldnt do XOR and so everyone abandoned the field until people found out that you could use backprop to do MLPs. but this story sounds ridiculous because computing gradients in a MLP is not hard..

Anonymous Sun Feb 16 10:39:19 2020 No.11390004

>>11389986
Klaus-Robert Müller?
He works mostly with SVMs instead of DL which aren't really in the same sphere. In that case kernels are about the MOST relevant thing ever to him, that goes without saying, but more or less the point of deep learning is that, conceptually, you learn the optimal kernel jointly with the rest of the function, thus kernels become completely irrelevant. That's why DL works so well: you remove one more source of human error (kernel selection and tuning).

>>	Anonymous Sun Feb 16 10:39:44 2020 No.11390005 Weird question, but it’s been frequently reinforced that neural nets can map any function. Is it possible to get back the original function given a neural nets, it’s weight and other relevant information?

Anonymous Sun Feb 16 10:41:18 2020 No.11390007

>>11390003
(Classical) perceptrons have a gradient of 0 everywhere because the loss is 0-1. It's common for good ideas to be abandonned because there's a small thing preventing it from working well in practice and we have other tools that are Good Enough (tm) and no evidence the new toy is going to overtake these methods 100 times over. A modern example of this is GANs, which were introduced in 2014 or so, but only became useful many years later, because they were too unstable in their original form.

>>	Anonymous Sun Feb 16 10:43:39 2020 No.11390017 >>11390007 would it not seem obvious to someone to just approximate the 0-1 with a sigmoid? even just during training and then threshold at 0.5 later or something? i assume logistic regression was around at the time...

>>	Anonymous Sun Feb 16 10:43:40 2020 No.11390018 >>11390004 Nah, does quite a bit of deep learning as well, I'm literally taking his course on it

Anonymous Sun Feb 16 10:46:02 2020 No.11390027

>>11390005
Arbitrary NNs cannot model arbitrary functions. You need an NN with infinite depth and 2 neurons per layer, or infinite width and at least 2 layers for this (this is the universal approximation theorems).
The function expressed by a neural network is always (e.g.) sigm(W2 sigm(W1x + b1) + b2)...
What you're asking is basically, "if I give you the sequence 1, 2, 3 can you tell me that the correct function that generated them is fibonacci?" and the answer is "no because many functions fit these numbers correctly, including identity".
Alternatively, you can ask the question: can we extract a function from the trained neural network that has X form? And the answer is "yes, it's quite easy. Just input arbitrary data into the NN (doesn't need to have anything to do with real data) and get the output, then fit that to your X form'd function (say a polynomial or a decision tree)".

>>	Anonymous Sun Feb 16 10:47:05 2020 No.11390033 >>11390018 I don't really know anything about him beside the fact he was a physicist and used SVMs to do a bunch of shit. What's he even got to say about kernels within the realm of DL?

>>	Anonymous Sun Feb 16 10:51:04 2020 No.11390049 >>11390033 He does chemistry and biology stuff with nns now All of his kernel and svm shit is like 2 decades back

Anonymous Sun Feb 16 10:54:04 2020 No.11390055

>>11390017
Original perceptron was modeled around neuronal structure in the brain which uses spiking to communicate, not an analog signal. To get from original perceptron to modern perceptron, multiple things needed to happen:
- The realization that spike rate in the brain matters more than spiking itself for a time-agnostic simulated system (basically that you can emulate fairly accurately a spiking neuron system over a fixed period of time by the spiking rate over that fixed period of time)
- The realization that spike rate can be emulated by sigmoidal activation (this requires quantifying spike rate and getting the idea of measuring it on a conceptual fixed timescale as the fraction of time where there was spiking)
And arguably (for scalability)
- The realization that SGD is an unbiased estimate of GD.

The problem wasn't that there wasn't a continuous approximation to a binary signal that could be optimized by gradient descent, it's that gradient descent for iterative learning was expensive, and more importantly, that it didn't seem to readily connect to the way animal brains work. That's why things like hopfield nets were more popular for a time for those who were interested in neural networks.

>>	Anonymous Sun Feb 16 11:06:20 2020 No.11390080 >>11390027 Thank you

>>	Anonymous Sun Feb 16 11:08:31 2020 No.11390086 >>11390055 i see... so basically just focusing on it on a model of biological neurons rather than just trying to make a good predictive model

>>	Anonymous Sun Feb 16 11:11:39 2020 No.11390091 >>11390086 This is why vapnik and many mathematicians seethe so hard when anyone mentions deep learning You don't know shit about what's going on inside them, literally "lol as long as I get good results"

>>	Anonymous Sun Feb 16 11:16:03 2020 No.11390107 >>11390091 the virgin SVM vs the chad DNN

Anonymous Sun Feb 16 11:17:41 2020 No.11390112

>>11390086
Yes, exactly. The NN research at the time was really more in line with philosophical questions of consciousness, strong AI, the meaning of learning, etc. It was believed that taking inspiration from how the brain behave was going to help unlock secrets that would allow us to train a model as you can train a child (a few years after these paradigms became popular, there was a lot of work around curriculum learning for similar reasons, yet in practice nobody uses that today). This was in contrast with other forms of machine learning (matrix inversion, PCA, operational research methods, EM, etc.) that used mathematical tools to extract 'factors of variations' in the data or 'underlying manifolds', or optima, etc.

>>	Anonymous Sun Feb 16 11:42:55 2020 No.11390181 >>11390091 https://www.youtube.com/watch?v=STFcvzoxVw4&t=1381s the cope

Anonymous Sun Feb 16 11:54:12 2020 No.11390212

>>11390181
Lmao knew what interview that was going to be. BTW there's a part 2 now. However
You can call it cope as much as you want, doesn't make him less wrong. Deep learning papers are just not scientific enough and make too many assumptions, and bend the interpretations of their results in a way where they're always seen in a positive light. Until we understand how NNs actually work and can axiomize or at least describe properly what's going on, it's just a scientific deadend which is going to collapse in on itself when some dude comes along and BTFOs the entire field by showing that it's in actuality inefficient as fuck

Anonymous Sun Feb 16 11:58:26 2020 No.11390230

>>11390212
Nobody thinks that deep learning is perfect but everyone knows it's the best thing that works right now. There are endless theoretical research efforts in how they work but nothing convincing so far. That's math, though, not science. Most DL papers suck because most of everything in every field sucks, not because it's DL. The good papers are in fact good, and it's very rare, unlike in every other field ever, that DL papers twist results, actually.
Describing what's going on is easy and has been done decades ago, that's pointless. The question is not what, which is understood by even children, it's WHY.

>>	Anonymous Sun Feb 16 11:59:47 2020 No.11390233 >>11390212 i mean sure you can criticize the theory, but to just say that everything that deep learning can accomplished just "must have not been hard problems" is ridiculous

Anonymous Sun Feb 16 12:02:10 2020 No.11390239

>>11390230
>The question is not what, which is understood by even children, it's WHY
That's literally what I said
>>11390230
>Most DL papers suck because most of everything in every field sucks, not because it's DL
Wrong, they suck because they are written by a few csfags who don't know shit about math and think they can just tweak some shit and arrive at a new conclusion. Seen enough times papers like these which shouldn't've ever been published
>>11390233
>everything that deep learning can accomplished just "must have not been hard problems"
Where did I say that

>>	Anonymous Sun Feb 16 12:02:45 2020 No.11390240 >>11390239 vapnik said it

Anonymous Sun Feb 16 12:04:36 2020 No.11390244

>>11390239
>That's literally what I said
No, you said
>describe properly what's going on

>Wrong, they suck because they are written by a few csfags who don't know shit about math and think they can just tweak some shit and arrive at a new conclusion.
It's your own problem if you go out of your way to ignore the good papers and only read the shitty ones that are probably on arxiv and not published.

Anonymous Sun Feb 16 12:14:29 2020 No.11390263

>>11390244
>It's your own problem if you go out of your way to ignore the good papers and only read the shitty ones that are probably on arxiv
Stop misinterpreting my claims retard, I never said there are no good papers, just that there are more shitty papers than there should be

>>	Anonymous Sun Feb 16 12:16:38 2020 No.11390270 >>11390263 That's not what you said before, stop moving the goalposts. Also if you really think that, you've never read papers in any other domain, period.

>>	Anonymous Sun Feb 16 17:31:29 2020 No.11391134 >>11390244 He is right that there are disproportionately many bad dl papers by people simply tweaking minor things or rebranding old models. Then there's also the reproducibility crisis, which hit ML papers in general hard.

Anonymous Sun Feb 16 18:04:32 2020 No.11391211

>>11391134
Wrong on both points. If you want a field strewn with bad reprod, see biology, physics or psychology. If you want one with disproportionally bad papers getting published even at top venues like nature and science, see biology, statistics, or physics. Other crap quality fields include natural language and computer vision papers (i.e. those keeping to classical methods instead of ML). If anything, ML papers have had an outstandingly high reproduction rate historically, with a ridiculously strong culture against unpublished code and data. Papers that use proprietary data to publish results virtually automatically get rejected. Those using public data with no public code have an extremely high rejection rate. The exact opposite of every other field.
Tweaking minor things or rebranding old models is becoming increasingly common (and almost no work past 2015 was more original than that) but that's not disproportionate compared to other fields, in fact it's still far ahead of the aforementioned ones.
Since you don't know the first thing about the topic and have never read a single paper, how about you shut the fuck up until you've at least done this most modest of homeworks?

>>	Anonymous Sun Feb 16 19:21:40 2020 No.11391417 >>11391211 Lmao

Anonymous Mon Feb 17 02:45:30 2020 No.11392179

>>11391211
Seething hard. The truth is literally one Google search away.
>ML papers have had an outstandingly high reproduction rate historically, with a ridiculously strong culture against unpublished code and data
Absolute lie. Are you trolling or really this delusional?
Sure, many big papers have github repos, but the vast majority doesn't.

>Since you don't know the first thing about the topic
Hahaha, the seethe. But yeah, these blind accusations already show you're retarded. I'm one and a half year in my PhD and published two pr papers. You are literally delusional.

>>	Anonymous Mon Feb 17 02:52:55 2020 No.11392186 >>11391211 https://mc.ai/is-deep-learning-facing-a-reproducibility-crisis/

Anonymous Mon Feb 17 03:47:26 2020 No.11392239
File: 46 KB, 520x544, knr5ht5Nc217T8AF2-B3tMb-mjVB6yn5msiN0L59Zy8.jpg [View same] [iqdb] [saucenao] [google]

>>11390230

>everyone knows it's the best thing that works right now.

Is this what they told you after you finish your first DL MOOC course lmao. You clearly have never been in any Kaggle contest.

>Most DL papers suck because most of everything in every field sucks

Imagine being this mad when people point out the only thing you know a bit about is a meme you have to cope with the belief that everything must be memes.

>>	Anonymous Mon Feb 17 05:30:47 2020 No.11392362 >>11392239 That's why csfags who claim you don't need to know much statistics for ml make me seethe so hard. Imagine not knowing what your entire supposed field is based on

Anonymous Mon Feb 17 09:47:16 2020 No.11392856

>>11392179
>>11392239
Yum yum yum delicious butthurt tears.
>>11392186
>recsys
>information retrieval
You just proved me right, that's exactly what I said in my post: fields that don't use ML don't have reproducible results (and then blame ML for it even though the ML parts are the only part that's reproducible lmao).
Your very own blogpost itself, even though it says "all of ML has a reprod crisis reee" then goes on to discuss at length that... non-ml methods are the ones that can't be reproduced, and doesn't give any example of actual ML system with that problem. They also go out of their way to mention this is all about pre-DL stuff, and that's not mentioning the shit I mentioned before and in my post.
Now that you've been thoroughly BTFO, you can head back to >>>/x/ from whence you came. You're literally too retarded to even read a blogpost by retards, as opposed to reading papers and making a judgement call. At least you're not as retarded as the butthurt posters above, yet you are clearly nobody who belongs on /sci/ (while the others don't even belong on 4chan, as they're clearly underageb&).

Anonymous Mon Feb 17 10:29:17 2020 No.11393044

>>11392856

lol you're just caught out of your depth. Since you believe DL is the best thing that works (nobody with real experience in ML actually thinks along this line) you will at best be some incompetent parrot who can only solve exercises from MOOCs. Try real world ML competitions. Also those mudslinging efforts lmao only you are butthurt and retarded, at least the other guys have a coherent argument with corroboration from others including Vapnik himself. It's okay keep doing what you're doing, it's actually beneficial for us.

>>	Anonymous Mon Feb 17 10:32:38 2020 No.11393062 >>11393044 Congrats, you're clinically retarded!

>>	Anonymous Mon Feb 17 10:51:57 2020 No.11393130 File: 65 KB, 1200x514, 35hp79.jpg [View same] [iqdb] [saucenao] [google] >>11393062 >your brain on MOOC Deep Learning Even your insults are just parroting what someone else wrote before lol

Anonymous Mon Feb 17 10:54:10 2020 No.11393135

>>11393130
Imagine being so deeply lobotomized you think kaggle and MOOCs are a representation of reproducibility and paper quality in a research-oriented field. Imagine unironically believing that principles and technologies that has unlocked never-before-seen advances doesn't work while technologies and principles that haven't done anything useful in the past 40 years through no lack of funding or effort are somehow "the future". Wew to that laddies.

Anonymous Mon Feb 17 11:01:37 2020 No.11393155

>>11393135

Well, you probably said so many retarded things that multiple people disagreed with you on different things and now your thinking is muddled and you start confusing them with each other lol

Nobody believes you are doing `research` MOOC parrot.

>>	Anonymous Mon Feb 17 11:02:55 2020 No.11393158 >>11393155 Oof, seethe and cope tranny. Go dilate somewhere else.

>>	Anonymous Mon Feb 17 12:17:48 2020 No.11393352 >>11393158 >DLfag >Calls others trannies If anything your field is gonna have the most trannies faggot

>>	Anonymous Mon Feb 17 12:21:22 2020 No.11393368 >>11393352 >coping intensifies

>>	Anonymous Mon Feb 17 12:29:37 2020 No.11393401 >>11393368 >Coping I got my comfy data analyst career in research with h index 10, what have you got

Anonymous Mon Feb 17 12:37:15 2020 No.11393439

>>11393401
An h-index an order of magnitude higher than yours and methods that actually work instead of wankery one-off throwaway proprietary scripts evaluated only on proprietary data that can't be reproduced even on that data because "I always sort my variables independently since it gives me a better correlation" is a valid contribution to your """field""".

>>	Anonymous Mon Feb 17 12:39:05 2020 No.11393448 >>11393439 discarded, thanks for the Ted talk

Anonymous Mon Feb 17 12:59:36 2020 No.11393522

>>11392856
kek, you're a special kind of retarded. I pondered some time whether it would be worth responding to an autist like you.
You have never tried to reproduce the results of a paper, yet talk out of your ass as if you were a published author. Pathetic.

>>	Anonymous Mon Feb 17 13:39:04 2020 No.11393620 >>11393522 The irony in this post is so thick, even a knife couldn't cut through it.

>>	Anonymous Mon Feb 17 13:54:05 2020 No.11393650 >>11393620 lmao

Advanced search
Text to find
Subject [?]Search by post subject. Leave empty for any.
Username [?]Search for user name. Leave empty for any user name.
Tripcode [?]Search for tripcode. Leave empty for any.
Email [?]Search by email. Leave empty for any.
Filename [?]Search by image filename. Leave empty for any.
From Date [?]Enter what date to start searching from. Format is YYYY-MM-DD
To Date [?]Enter what date to start searching until. Format is YYYY-MM-DD
Image hash
Search in	All Posts OPs Only
Deleted posts	Show all posts Show only deleted posts Only show non-deleted posts
Internal posts	Show all posts Show only internal posts Show only archived posts
Order	New posts first Old posts first
Capcode	All Posts Only by Users Only by Mods Only by Admins Only by Developers
Results	Posts Threads
Action	[ Simple ]

/sci/ - Science & Math