[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math


View post   

File: 4 KB, 360x228, 5y620.gif [View same] [iqdb] [saucenao] [google]
10122995 No.10122995 [Reply] [Original]

/ST/ Statistics thread.
Everything relating to statistics welcome. Now enjoy the picture of a bell curve.

>> No.10123009

Just barely related but I'm doing a simple regression analysis right now on the correlation between how popular a major is, how smart its students are (measured by standardized test scores), and how much money it makes at the bachelor's level (in the first year at least). I'm also doing another regression project for an economics class on the relation between immigrant population and education, and the output of certain industries. Both are quite fun, even if the latter is limited by me only having five independent variables right now.

>> No.10123010

>>10122995
can someone give a brainlet-tier explanation of "signal significance" and how it relates to likelihood?

>> No.10123021

>>10123009
Cool in biology I've learned to do standard deviation. And I'm currently doing T testing where we were seeing if there is a significant difference in heart rate before and after exercise it shows that there is. Do you have any resources to learn statistics? I don't know why but I want to learn about it.

>> No.10123073

>>10123021
Sorry, but you're barking up the wrong tree there. The only straight stats classes I've taken were two brainlet tier lower divs required for my major. I just have the bare minimum knowledge needed to get through upper division econ classes.

>> No.10123102
File: 4 KB, 399x266, halppls.png [View same] [iqdb] [saucenao] [google]
10123102

can you guys help me describe these two trendlines?

I'm trying to look for a way with which to quantify the blue lines dip as an erratic change, compared to the steadily dropping red line.

>> No.10123129

>>10123102
The red line is decreasing you can calculate a rate by how much value decrease's over time. The blue line starts off plateau then dips and then rises back up and plateaus.

>> No.10123477

>>10123102
An important thing to note is that if you are concerned with an actual time series, then the first can be modelled with a stationary ARMA model + a linear trend, i.e.

X(t) = a + bt + Z(t), where Z(t) is an ARMA trend that is appropriate in the scenario you have. You can estimate Z(t) by looking at the differences X(t) - X(t-1).

For the second one, it might be a result from a seasonal trend that is being obscured from the lack of other data, or it may be a kind of one-off event, like a market crash, I'm not a time series expert (only took a basic course during my stats major) but I suppose you could model that as an additional component C(t) to obtain

X(t) = a + bt + Z(t) + C(t) , where C(t) has a heavy tail centered around 0 (e.g. pareto or cauchy distribution), in that way, you do not practically exclude the possibility of massive bumps in the trend, but that a lot of the time it is centered around 0. But I suggest reading a proper advanced time series text to know for sure.

>> No.10123514

>>10123021
OpenIntro statistics book, Passion Driven Stats curriculum

>> No.10123650

>>10123021
>Do you have any resources to learn statistics?
most graduate level stats courses (even stats 101-type stuff) is typically taught personally via the instructor. they usually offer some PDF-type shit, or a personal book/pamphlet to buy.
that being said: there is an open stats book you can use via

http://4chan-science.wikia.com/wiki/Mathematics#Probability_.28Multivariable_Calculus_based.29

>> No.10124238

>>10123477
hm I should've been clearer
my data series has values across my timespan and one of the lines behaves as a wave while the other doesn't.

>> No.10125711

>>10123009
Did this, found no statistically significant correlation between major difficulty and how popular it is. Found a negative relationship between standardized test writing scores and income, probably because a lot of lower-earning humanities majors had good writing scores. i'm trying to find a way to control for that because despite the p-value being low the negative (basically saying someone who got good GRE QR and VR scores and a bad AW score is better than someone who got good scores in all three areas) coefficient generated makes no logical sense.

>> No.10127090

bump

>> No.10127162
File: 57 KB, 712x792, DrP4LwGWkAAYdoK.jpg-large.jpg [View same] [iqdb] [saucenao] [google]
10127162

>tfw all thread about statistics always is about stats 101

>> No.10127171

>>10127162
Maybe because stats is an absolute trash subject.

>> No.10127228

>>10127171
>t. someone that only has read stats 101

>> No.10127359

>>10127171
What how do you make sense of da numbers n shieet?

>> No.10127366

>>10123073
>economist who thinks they’re a scientist doesn’t know probability and statistics, proofs, analysis or mechanics
lol

>> No.10127396

>>10127359
Intuition.
>>10127228
>applying ARIMA is really fun isn't it

>> No.10127440

>>10124238
If by wave you mean it oscillates you can simply put a sinusoidal trend to obtain,

X(t) = a + bt + c * sin(nt) + Z(t), where Z(t) is the fitted stationary process and sin(nt) gives it some oscillation, the period will be determined by how big or small n is. e.g. if your time is in years and the oscillation is monthly then you should have sin(2pi / 12 t).

>> No.10127457
File: 183 KB, 572x692, 1541341772731.jpg [View same] [iqdb] [saucenao] [google]
10127457

>>10123477
>a
>stochastic component
>between 0 and 1
>up
>to
>a
>high
>potence

>> No.10127487

>>10125711
nonsensical results are sometimes a result of either noisy data, confounding variables and interaction effects.
If you have data on the type of major you can do one of the following things to control for your hypothesis of high-scoring low-earning humanities majors,

1. stratify the data and run separate regressions on only humanities students, and only science students and compare the models built

2. check model assumptions, are there some outlier geniuses that score very low for some odd reason? default regression is not robust and even a few outliers can greatly influence the trend, have a look at "Cooks distance" plots (you can use the plot function in R with your model, i.e. plot(lm(income ~ score + major))). If your assumptions are violated for the linear regression, then you shouldn't take the p-values too seriously anyway, in this case, you can try a robust (L1) form of regression. Alternatively you can use a GLM if you don't find income to be normally distributed.

3. Put in interaction effects into your model, i.e. lm(income ~ score + major + score*major)

The most important thing and easiest thing to do though is number 2, check your assumptions, although your model can "work" without the assumptions being satisfied, the p-values you obtain are meaningless.

>> No.10127492

>>10127396
>doing time series
>ever
I'll stick to categorical data since economics is a meme

>> No.10127823

>>10127487
This is actually really useful, thanks.

>> No.10127835 [DELETED] 
File: 153 KB, 1024x683, TRINITY___Forever.jpg [View same] [iqdb] [saucenao] [google]
10127835

If I BTFO detractors in a thread it is statistically more likely to get deleted than if I just use the thread to explain to them the wages of their infidelities again.

>> No.10128097

>>10127171
then why u here senpai?

>> No.10128113

>>10123009
how are you measuring popularity?
first year retention? graduation rate?

>> No.10128300

>>10128113
The number of people who graduate with that degree. It's listed in the ACS PUMD sample, info mirrored in various articles.

>> No.10128492

Just how is the job market for stats? Im thinking of switching from math to stats, is there anything I should definitely know? I'm in Canada btw not sure of it matters.

>> No.10128511
File: 18 KB, 864x121, asylum requests.png [View same] [iqdb] [saucenao] [google]
10128511

>>10127440
that feels contrived desu

Can I really just say that "well one fits a sinusoidal better than the other" and be done with it?

To be clear I am statistically evaluating countries that send asylum seekers to Austria over the last 17 years, and as the image below shows, some countries exhibit wave-like behavior while others offer a steady stream.

>> No.10128517
File: 152 KB, 512x512, 1541236844138.gif [View same] [iqdb] [saucenao] [google]
10128517

>>10128492
Dunno how it's in Canada but here in Sweden everyone gets high paying jobs if they study what we call mathematical statistics i.e. math major with focus on mathematical stats, probability and stochastic processes. Most work for insurance companies followed by universities, pharma and banks

>> No.10129150

>>10128517
Aren't those jobs soul crushingly boring? Being an actuary especially sounds dull as hell.

t. Considering a masters in mathematical statistics

>> No.10129175
File: 54 KB, 396x600, s4CKqxi6yWIC.jpg [View same] [iqdb] [saucenao] [google]
10129175

I love bell curves. They prove the inferiority of niggers

>> No.10129245

>>10129150
I dunno, I have just talked to one and he seemed to like it. I might read a masters in insurance math and find out, seems fun to do survival analysis and modelling insurance shit, at least in the intro course to insurance math

>> No.10129247

>>10129245
>my friend is an autist
>>10129150
swedes are autistic subhuman automata, yes its extremely boring, thankless and tedious

>> No.10129285

>majoring in ecology
>decide one night that I should minor in stats

>> No.10129286

What's a good book on survival models? I'm taking it next semester and it apparently is brutal with the professor I have. I'm looking for a rigorous, comprehensive text, since the only experience I have with the subject is actuary related.

>> No.10129305

>>10129175
back to >>>/pol/, cletus

>> No.10129333

>>10129247
That's what I figured, CS or applied math seem to be the only things that may lead to interesting jobs.
Or academia I suppose but that ship already sailed.

>> No.10129355

>>10129150
are there any jobs that aren't soul crushingly boring?

>> No.10129887

>>10129355
from what I've read, if you manage to turn your passion into your occupation, you can't even call it a job anymore, since you love doing it.

Personally I'm conflicted on the topic, I believe that at some point you'll get sick of it and you won't love your passion anymore. For what anecdotes are worth: my mother worked in a hotel kitchen, and I can tell that there are leftovers from that. I love cooking and eating food, but I won't ever work as a chef because I don't want to ruin that for myself as my mom did for herself.

Be careful with it.

>> No.10129965

>>10123010
it's not even real. it's arbitrary. if someone's using it just know that they are saying something as meaningless as "highly likely"

>> No.10129973

>>10127440
for fucks sake he just means it has a bump instead of being straight.

>> No.10130071
File: 71 KB, 2580x1387, maq.png [View same] [iqdb] [saucenao] [google]
10130071

I'm constructing a time series forecast for inflation the US

Time series methods are underrated imo, much more fun than non-regression stats

>> No.10130136

do stat people have high iq? trying to teach myself some statistics but i feel like an idiot.

>> No.10130179

>I assume X is true, and as a consequence the data has Y distribution.
>If P(data) under Y is less than 5%, I will conclude X is false
>Experiment and observe data and see if you won that bet against you in the past
Is this the entire point of significance testing? Why is it full of weird terminology like p-value and power analysis?

>> No.10130220

>>10127457
I don't get it

>> No.10130228

>>10130136
a lot of it assumes a certain intuition to be sure.

>> No.10130712

>>10130136
If you know the math it all makes sense, study probability theory until you have done multivariable probability, asymptotic shit, convergence and transforms. Then everything is easy and makes sense

>> No.10130714

>>10130136
Stats is literally the easiest part of my degree, I have much more problems with other maths
t. econ

>> No.10130732

>>10130071
Enjoy your innacurate forecast

>> No.10130836

>>10130714
That's because the stats we use are pleb tier stuff and Econometrics.
>>10130071
If you get the short term completely wrong that's normal, and if they start averaging out in a year or two that's also normal.

>> No.10130846

>>10130220
The complete lazy bum way of making the line do huge erratic jumps outta nowhere before falling back into the pattern.

>> No.10130917

>>10122995
So are bioinformatics any good?

>> No.10130964
File: 199 KB, 1790x828, 1536209356748.jpg [View same] [iqdb] [saucenao] [google]
10130964

>>10130714
I'm in both econ and stats, and >>10130836 is spot on. The stat courses that that statistics majors take is some next level shit.

>have more problems with other maths

I hear ya there. Everyone's getting #rek'd in my microeconomic theory course (not the babbies first micro class). Like the calculations aren't bad when you get down to it, but the questions are super vaguely worded,even the TA is like, wtf is this shit here and there.

>> No.10130977 [DELETED] 

>>10130964
Demand curves slope downward* and supply curves slope upward {math}^{*2}{/math}
*unless it's a giffen or veblen good
{math}^{*2}{/math} in the short term

>> No.10130978

>>10130964
Demand curves slope downward* and supply curves slope upward [math]^{*2}[/math]
*unless it's a giffen or veblen good
[math]^{*2}[/math] in the short term

Classical microeconomics is an absolutely top tier shitpost. Y'all using Varian?

>> No.10131016

>>10130732
Inflation is one of, if not the most accurately forecasted economic variables, most professional forecasters are usually able to predict it within a tenth a of a percent

>> No.10131024

>have small sample
>too small
>randomly resample from existing sample distribution until sample is big enough and has appropriate properties for inference

Am I missing something here? How do resampling methods avoid sampling bias from the original sample?

>> No.10131052

>>10129887
I don't really have a passion anyway, only that I'm interested in a couple of things, like math. Though I might find something to be passionate about with time.

>> No.10131061

>>10128517
Do you know which programs I should consider studying as a fellow swede? I've looked at eng. physics and industrial engineering at KTH but I don't know which one to choose.

>> No.10131240

>>10131061
Dunno, I am just studying for a bachelor in mathematical statistics and I really don't know which engineering program is better. What I do know is that it everyone I have met from KTH that has decided to become an actuary has all been ind. ek. guys

>> No.10131337

>>10131240
Ah okay, cool, might go for ind.ek. then.

>> No.10131779
File: 228 KB, 1897x822, reg.jpg [View same] [iqdb] [saucenao] [google]
10131779

Anyone else doing econometrics?

>> No.10131813

>>10131779
>the size of that excel sheet
Seems about right. Weird number of collumns tho, normally it's longer downwards and thinner sideways.

>> No.10131837

>>10131813
It goes down for several hundred more cells (677 per column).

>> No.10131841

>>10123021
noob

>> No.10131847

>>10130978
Oh yeah, Varian was great.

>> No.10131848
File: 8 KB, 250x202, IMG_2488.png [View same] [iqdb] [saucenao] [google]
10131848

>>10122995
>not even standardized

>> No.10131868

>>10130978
Nah we're using Nicholson and Snyder, it's a pretty decent book in my opinion, despite the fact that microeconomics isn't my favorite part of the field.

>> No.10131880

>>10131847
Other than the cringe jokes, yeah.
>>10131868
>that math introduction
Surreal.

>> No.10131904

>>10131779
I'm starting my econometrics courses next semester, definitely looking forward to it.

>> No.10131912

>>10131779
Good old Census data. Gives me flashbacks.

>> No.10131933

>>10131868
Micro was very popular at my school

Most of the electives offered focused on micro, too

>> No.10131995

>>10131933
The program at my university is more econometrics centered. I mean I don't dislike micro, it's just that this particular course a weeder course.

>> No.10132019

>>10131779
ye
>tfw course is full of brainlet managers
i swear to god i am pretty much the only student there who has a semblance of an idea what is going on
got any interesting data to do a semestral project about lads?

>> No.10132024

>>10131779
I'm not actually in an econometrics class right now, I'm in another economics class that requires you to do a significant econometrics project without actually telling you that the econometrics class (or even the watered-down data collection and analysis class for non-quantitative specialized majors) is basically a prerequisite. Fuckers. At least the actual econometrics class might be slightly easier now that I've learned the bare basics and done a project. Though tests might still be a problem.

>> No.10133005

>>10129175
redpilled

>> No.10133011

I donate to a charity in which it on average cost $4500 to avert a death. How much do I need to donate to get a three-sigma confidence that my contributions have in fact averted an actual death?

>> No.10133097

>10133011
Go away

>> No.10133114

>10133097
Go away

>> No.10133136

>>10127162
Literally all of /sci is undergrad topics 101. and pop-sci topics with no actual discussion, because everyone here is in undergrad.

>> No.10133559

>>10131779
>statistics thread
>using Excel

>> No.10134062
File: 34 KB, 600x600, 1300044776986.jpg [View same] [iqdb] [saucenao] [google]
10134062

>>10133559
>not gathering project data from data.gov
>not checking to make sure the format isn't dogshit before importing into RStudio

I'm not saying to look through thousands of rows, but at least checking to make sure the columns aren't completely fucked, how many sheets are in the file, and generally get an idea of what you're dealing with can make life a bit easier.

>> No.10134087

>>10134062
>"i'm just importing it bro"
>it's color coded

>> No.10134099
File: 51 KB, 540x545, 1539025473354.jpg [View same] [iqdb] [saucenao] [google]
10134099

>>10122995
could you guys please take a look at my question here: >>10133861

i asked it in terms of general mathematics, but i am interested in applications of statistics in particular.

how do i into statistics?

>> No.10134242
File: 200 KB, 890x1280, 1541876386898.jpg [View same] [iqdb] [saucenao] [google]
10134242

>>10134099
Undergrad:
Learn up to multivariable probability, convergence theorems, then pick up a book on statistical inference like applied statistical inference: likelihood and bayes, and a book that treats general linear models in depth + generalised linear models(dont have a recommendation since I used material in Swedish) and stochastic processes(intro to prob models by Sheldon Ross). I highly recommend going through Categorical Data Analysis by Agresti as well

Master level:
After that if you want to go deeper learn measure theoretic probability theory(Kai Lai Chungs A course in probability theory is good), and mathematical statistics by shao to relearn everything with measure theory, learn more bayesian modelling with bayesian methods for data analysis by Carlin & Louis. Find a book that covers statistical modelling and inference on the exponential family(the one I'm using is in Swedish). For biostats and epi this book is pretty good Lachlin: Biostatistical Methods. If you want survival analysis use "Survival and Event History Analysis: A Process Point of View" by Aalen, Borgan och Gjessing. Dont have a recommendation in grad level stochastic processes yet.

You should probably throw in baby rudin and some more analysis as well somewhere
Now I really gotta sleep

>> No.10134977

Is advanced statistics a meme?

>> No.10135543

>>10122995
Cute bell curve

>> No.10135921

>>10134062
>>10134087
Stanza is correct, I do everything in R-studio when possible (occasionally get forced into Stata or SAS) and it makes life a billion times easier to do any actual organizing/viewing in excel.

Excel - good for viewing, arranging spreadsheet
R-studio - god tier for regression analysis, graphics, everything else

>> No.10136117

non parametric tests all seem like a major ass pull. i trust them not one bit but my data are all with small n and no normal distribution so i have to use them.

life sucks.

>> No.10136155
File: 9 KB, 263x191, 1813457822.jpg [View same] [iqdb] [saucenao] [google]
10136155

>>10131779
>tfw math major enrolled in econometrics for elective thinking I was going to learn a little bit of some more applied modeling
>turns out "econometrics" at my university actually means "Regression For Retards"
at least it's an easy A but fuck man this has to be either the #1 or #2 most disappointing class I've ever enrolled in

>> No.10136231
File: 84 KB, 417x416, 1539311980231.gif [View same] [iqdb] [saucenao] [google]
10136231

>>10136155
There are places it doesn't mean that?

>> No.10136271

>>10136231
I dunno dude.
This is a 300-level econometrics course and they cover multiple linear regression in stats 100 here (in like 2 weeks).
I expected junior economics students would have taken basic statistics at some point. Apparently this is not so.

>> No.10136284

>>10136271
>basic statistics
That's :
>the whole population, sample, etc, glossary
>sampling methods
>basic probability
>essential continuous and discrete distributions
>hypothesis tests
If your uni manages to add linear regression on top of that then you guys are the mega autists.

>> No.10136301

>>10136271
econometrics is considered rigorous for brainlet social science faggots, most people who specialize in that field have done almost no mathematics since highschool. Its just so much more intensive than the other social sciences that the relative distance emboldens them enough to feel like they're actually scientists. You should not have been surprised at all.

>> No.10136326

>>10136271
Where I am basic stats is required before econometrics, and even basic stats has pre reqs before that

Maybe the econ program where you are just isn't that good

>> No.10136433

>>10133136
this

>> No.10136460

>>10131779
I like excel but dam it can be tedious.

>> No.10136463

>>10135543
Just like you.
Inb4 t.faggot

>> No.10136585

>>10136155
This is really encouraging
>t. econ major and brainlet

>> No.10136614

>stat minor
>only taken 2 stat classes, barely know shit about it
I'm taking Sampling Methods next semester. What am I in for?

>> No.10136617

>>10136614
Extremely basic, boring shit.
>if you like divide a population into blocks, you can like, sample within the blocks
>woah

>> No.10136618

>>10136617
Damn I thought since it was 400-level upper div it would be more exciting

>> No.10137362

>>10136271
>multiple linear regression
>in stats 100
>in 2 weeks
I guess everything is really dumbed down