[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math


View post   

File: 400 KB, 529x423, 1519707668939.png [View same] [iqdb] [saucenao] [google]
10647840 No.10647840 [Reply] [Original]

>>213258794
>Complex machine learning algorithms, which allow for complexities such as high-order interactions, require an enormous amount of data unless the signal:noise ratio is high. Regression models which capitalize on additivity assumptions (when they are true, and this is approximately true is much of the time) can yield accurate probability models without having massive datasets.
>Users of machine classifiers know that a highly imbalanced sample with regard to a binary outcome variable Y results in a strange classifier. For this reason the odd practice of subsampling the controls is used in an attempt to balance the frequencies and get some variation that will lead to sensible looking classifiers (users of regression models would never exclude good data to get an answer). Then they have to, in some ill-defined way, construct the classifier to make up for biasing the sample. It is simply the case that a classifier trained to a 12 prevalence situation will not be applicable to a population with a 11000 prevalence. The classifier would have to be re-trained on the new sample, and the patterns detected may change greatly. Logistic regression on the other hand elegantly handles this situation by either (1) having as predictors the variables that made the prevalence so low, or (2) recalibrating the intercept (only) for another dataset with much higher prevalence. Classifiers’ extreme dependence on prevalence may be enough to make some researchers always use probability estimators like logistic regression instead. One could go so far as to say that classifiers should not be used at all when there is little variation in the outcome variable, and that only tendencies (probabilities) should be modeled.
http://www.fharrell.com/post/classification/
Can someone expand on regression using additivity assumptions as well the section about regessiong handilying it eligently.
Should I be using logistic regression then and not neurel nets?

>> No.10648822

Read more books

>> No.10648831

>>10647840
neural networks solve different problems than regression

>> No.10648956

>>10648831
What is different from regression vs NN with softmax output layer other than logiistic regession being smoothest decision boundary?

>> No.10649052

>>10647840
It really depends on what you're doing, but Frank Harrell does biostats so he's quite biased towards interpretable models like logistic regression

>> No.10649080

>>10647840

Neural nets can be used as a non-linear regression model. They can also be used for other tasts such as classification / segmentation / etc.

Subsampling is also not the only way to deal with unbalanced datasets. Also if you believe you can do better regression using simpler models than neural nets why aren't you doing it? There's companies out there that'll pay serious money for your methods.

I'm not sure what your question is OP.

>> No.10649146

>>10649080
He was saying you should NUT do sub-sampling and that if you're using logistic regression you can elegantly handle imbalanced samples.
How would you handle it for NN?