[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math


View post   

File: 541 KB, 1450x1100, r.png [View same] [iqdb] [saucenao] [google]
6629892 No.6629892 [Reply] [Original]

R thread?
R thread.
>what do you use R for?
>how long have you been using it?
>current projects?

>> No.6629943

>statistical analysis of sports for betting purposes.. NHL and NBA mostly
>about 10 months
>NHL moneyline. up to 58% correct prediction on out-of-bag testing set

>> No.6629998

>>6629943
>58% correct
I don't know anything about sports betting but that seems really good. Why aren't you rich yet?

>> No.6630009

>>6629998
It depends on the degree of parity in the league. If every team in a league was of equal ability, even 52-53% accuracy over the long term would be quite impressive.

In the NHL, one of the more balanced North American leagues, the threshold for profitability is around 59-60% while in the NBA, which has more powerhouses and basement-dwellers, you'd need to hit about 67% to make a profit.

I did a lot of work developing an algorithm in Excel for the NHL that has an expected value of +8% for each bet placed, but so far my expertise in R isn't all that just yet. There's always of course hesitation when putting an algorithm into real-world action with money at stake. stockmarket anons will understand that i'm sure

>> No.6630016

>>6629892
>R

>>>/biz/ is that way

>> No.6630077

Statistics undergraduate here, have been using it for a couple of years in labs along with SAS and Matlab. Current project is my bachelor thesis, where I'm building models trying to predict the rate of success among first year mathematics students, using data from last year.

>> No.6630079

>>6630016
R is used for plenty of things, not just biz my nizz.

>> No.6630080

>using R
>being a statsfag

>> No.6630082

I was thinking of learning R, but I looked for 10 minutes and couldn't figure out how to read in a binary file, so I just built a small statistics library in C

>> No.6630202

>>6630077
that's really cool anon. really cool. what are your measures of success? pass/fail? gpa?

>> No.6630423

>>6630080
>not knowing statistics
>in a science board
back to philosophy with you

>> No.6630477
File: 18 KB, 210x251, watermelon monkey.jpg [View same] [iqdb] [saucenao] [google]
6630477

>>6630423
hear, hear chap!

>> No.6630568
File: 124 KB, 339x323, t1388460173623.png [View same] [iqdb] [saucenao] [google]
6630568

bumping 4 curiousity
>>6630202
>>6630077
>>6630009

>> No.6630604

>>6630202
Yeah I have their GPA, math grades from high school, equivalent of SAT score (Sweden), gender, age etc. as well as their results on the course exams (Algebra/Analysis). I'm gonna make models using linear and logistic regression as well as classification trees and compare their predictive power with ROC-curves. I'll also be looking at older and younger students separately.

>> No.6630607

>>6630604
That sounds actually interesting, I'd love to read the results

>> No.6631190
File: 44 KB, 350x411, zimmer1.jpg [View same] [iqdb] [saucenao] [google]
6631190

>>6630604
that's sweet. sounds like a fascinating project. i'm sure you'll have the examiners on the edges of their seats because of how relevant to their field your topic is

>> No.6631206

I use it for AI/machine learning purposes sometimes. It's a nice language, if absurdly slow, makes dealing with matrices a pleasure. Don't actually use its statistical capacities all that much because statistics is hard.

>> No.6631213

currently using R to build gene interaction networks using published microarray data

shit's pretty fun, especially if you have a good server you can submit to

>> No.6631214

How did you guys start using R? I'm studying finance, which uses alot of statistical methodology, and I'd like to get ahead of the curve. Did you just delve into it, learning by trial and error? Did you watch videos online or check out R guides from your library?

>> No.6631220

>>6631214

Well that all depends on whether you have prior programming experience. If yes, just dive in. You'll find it's very simple. You can basically open someone else's program, read it, and that will tell you 90% of what you need to know.

If not, then I would suggest you look for an introduction to programming first.

>> No.6631224

>>6631206
what subject areas? biz? med? sci?

>>6631214
did an elective at b school. the prof was a spanish gongshow of a guy, super eccentric and rich as fuck. he made millions programming online poker algorithms in java before returning to his alma to teach part-time.

check out r-bloggers.com if you want some short fairly accessible tutorials

>> No.6631229

>>6631220
I have a little programming experience, but I'm not in anyway proficient. I know that R is a standalone language, but are there any specific languages I should study that are similar to R before focusing solely on R?

>>6631224
Thanks for the website. I'll check it out.

>> No.6631233

honestly i think you're better off learning R first than learning some other language and then moving to R, as long as your goal is "learning R" and not "learning how to program"

R is a distinct, complete language, but it's also a highly specialized one. it does a lot of things in ways that are unconventional for other languages but make perfect sense for its own purposes

>> No.6631238

>>6631214
>>6631214
try data boot camp or swirl to get started. A lot of programming is just trial and error don't be afraid to fuck up

>> No.6631250

>>6631233

That's fine and dandy if you only ever write simple things, but you'll have a hard time debugging code if R is your only experience.

>> No.6631372

>>6631213
Fuck Bioconductor. Whoever decided it was an acceptable standard to name 10 peoples' variants of the same function the exact same thing with only minor differences in capitalization (but entirely different input syntax) should be drawn and quartered.

>>6631214
Get RStudio. The worst part about learning R is the horribly shitty interface for things like package management and workspace visualization. Getting started will suck either way, but that makes it much easier.

>> No.6631377

>>6631372
>RStudio
enjoy your constant and unbearable input lag

>> No.6631411

>>6629892

is R any good?

I learned stats on SAS, which was fucking hell.

I toured a graduate school where all the big deal profs were using STATA. Should I be using STATA or R?

>social statistics here.

>> No.6631447

>>6631411
R is great on the whole

my personal preference is to do all my pre-analysis (as far as data manipulation, organization, prep goes) in Excel and then save the spreadsheet as a csv, then input it into R for analysis

anyone will tell you that as far as programming languages go, dealing with matrices is very easy in R. still, nowhere near as easy as it is in the GUI friendly environment. (thank you based bill gates)

R's analysis capabilities are phenomenal though

>> No.6631632

>>6631411
R is the science standard for statistics because it's good plus free

Other packages like STAT and SAS are industry standards because they're good plus well documented

>> No.6631938

>>6631411
R is better for simulations, unorthodox/niche methods, and methods research due to its extensibility. The main drawback is it shits its pants with very large datasets and certain iterative processes. My work requires a lot of Monte Carlo integration and other iterative solving, and it can get to be annoying at times.

Stata is better for implementing common, well-studied methods in a more applied setting. It's also faster and 100x easier to use.

>> No.6632262

Do you use R to actually edit the data or just analyze it? If I want to make a .csv, should I use R or another program?

>> No.6632485

>>6632262
R it's not good on editing data, maybe only if you want to run some sort of mathematical function or algorithm to generate new data, but I wouldn't recommend otherwise.

>> No.6632580

I once used R during a summer school but I can't remember what IDE I used. I would like to start using it again, can someone suggest me one (or more) IDE(s), possibly with its(theirs) most interesting features?

>> No.6632584

>>6632580
RStudio gives you a workspace similar to Matlab. Very helpful for making scripts and figures on the fly.

>> No.6632590

>>6632262
R is great at editing data, you just have to think about it differently

do you want to remove the fifth row from the matrix DataMatrix? just write DataMatrix[-5,] and assign it to a new variable.

do you want to remove all rows where the sum of the row is less than five, or any other arbitrary boolean expression? you can do that, just use a boolean expression to generate a logical vector and use that to index the matrix

no, you can't see what the individual values are very easily, but R isn't meant to be used on datasets where you need to inspect the values manually. it's made to manipulate data which comes in a predictable, defined format, and at that it's great

>>6632580
just use the basic R environment, you dont need an IDE

>> No.6632644

>>6632485
this.

>>6632590
not this.

if you want to edit data, unless your dataset has hundreds of thousands of entries, use Excel.

>> No.6632868

>>6632485
>>6632644
What should I use to edit data then? Assuming I can't use Excel

>> No.6632906
File: 8 KB, 282x179, d9271830.jpg [View same] [iqdb] [saucenao] [google]
6632906

>>6632868
hmm... afaik, there are three reasons you could have to not use excel
>can't afford it
>don't have authorization to install it
>dataset is too large for 32-bit install

my prayers are that it's #2 or #3

as for your alternative... god who knows. google spreadshites? maybe there's an open source spreadhseet application?

i hope you're kidding about not being able to use excel, anon...

>> No.6632938

>>6632868
If the dataset is too large, your best bet is C or Fortran

>> No.6634390

>>6632906
#4 It´s closed source software and no decent human being would use it (especially in an academic environment)

>> No.6634629
File: 27 KB, 285x298, don1.jpg [View same] [iqdb] [saucenao] [google]
6634629

>>6634390
you do realize that is literally the dumbest possible reason to not use it, right?
you are not Albert Einstein
you are not on the verge of splitting the atom
you are not going to impress anyone by being an anti-establishment douche

if you're trolling you're doing a fine job, but wow are you retarded

>> No.6634776

>>6630009
Have you considered doing the same thing in the stock market? If you paper trade, you can put your algos to the test in the real world without any risk.

>> No.6634781
File: 65 KB, 580x346, nice proprietary software faggot, richard matthew stallman, rms, gnu.jpg [View same] [iqdb] [saucenao] [google]
6634781

>>6634629

>> No.6634786

>>6632644
>unless your dataset has hundreds of thousands of entries,
any data set large enough to necessitate statistical analysis will almost certainly have at least ten to twenty thousand entries

>> No.6634789

>>6634629
>you do realize that is literally the dumbest possible reason to not use it, right?
No, I don´t and that´s because it´s not the dumbest but the best reason to not use it.

>you are not Albert Einstein
Considering that he´s dead that was not a difficult guess.

>you are not on the verge of splitting the atom
No, I´m not and I don´t intend to do so in the near future

>you are not going to impress anyone by being an anti-establishment douche
Well, you´re wrong. As I have said before, any decent human being, especially in academia, should and will care about openness of information.
It is crucial that everyone can access the information that is worked with in academia so that everyone can use it and check if it´s right for example.
If you release data that can only be viewed and worked with with a commercial product you limit peoples access.

>> No.6634811
File: 1.07 MB, 266x268, 1357472788282.gif [View same] [iqdb] [saucenao] [google]
6634811

>>6634789
>It is crucial that everyone can access the information that is worked with in academia so that everyone can use it and check if it´s right for example.

>If you release data that can only be viewed and worked with with a commercial product you limit peoples access.

You seem to be trying to imply that there is a single university on the face of the Earth that cannot afford and does not already have the licenses to use Excel.

1/10 for making me reply. Good luck with your future endeavours anon

>> No.6634814

>>6634776
That's something I've been considering. I just find finance less innately interesting than sports, that's all. not a bad idea though

>> No.6634825

>>6629943
sports lol
What kinds of parameters do you have in your models? I'm guessing you use historical data.. if so, be careful with look ahead bias

>> No.6635024

I used R this past semester for a Speech Processing class. used package tuneR mostly

learned some stuff about FFT and LDA and a few other acronyms that i forgot cuz i'm dumb

>> No.6635044

>>6629892
I use R for my stats mostly: repeated measures mixed linear models, post hoc tests, information criteria, etc.
I agree with those of you above that seeing the data in excel is nice. I have all my experimental data saved as csv that I examine with pivot tables (nothing else compares) in excel, and then process through R using VBA and shell scripts. It's nice and automated at this point but its taken a few years to get it that way. I wrote the program for my dissertation because nothing else would fit my needs.