/sci/ - Science & Math

File: 30 KB, 474x714, stats.jpg [View same] [iqdb] [saucenao] [google]

/data/ - Data Science General Anonymous Sun Jul 14 01:37:45 2019 No.10805292 [Reply] [Original]

Thread for anyone currently working in the Data Meme business.

Is pic related any good? I'm currently employed as a Data Scientist, but my main contribution to the team is that I write better Python/SQL/Q than the rest of them, it's fine but I want to go beyond that and get more stats/ML knowledge.

Also fuck R, take the Juliapill.

>>	Anonymous Sun Jul 14 02:04:41 2019 No.10805328 File: 25 KB, 432x432, download (14).jpg [View same] [iqdb] [saucenao] [google] >>10805292 Old school, must read... Bump for being Juliapilled. Haven't read that yet though.

>>	Anonymous Sun Jul 14 03:14:52 2019 No.10805478 >>10805292 Based Julia. I'm currently reading time series analysis by Shumway.

>>	Anonymous Sun Jul 14 03:41:38 2019 No.10805548 >>10805292 Haven't read the books that compete with ISL but it's a standard. How did you get your job, OP? What are your skills that helped you find a DS job? Portfolio? I'm doing Kaggle comps as I'm finishing my CS degree.

Anonymous Sun Jul 14 04:23:05 2019 No.10805651

>>10805548
I started out in a less technical role more similar to a Data Analyst and gradually got more technical.

Basically for DS you either go in that way, or go straight in after a STEM degree or whatever.

The field is still somewhat loose, so it also kind of depends what the job is, i.e. whether they need an ML engineer, a data analyst, a data engineer (read: DBA with more technical ability) etc

As a start, make sure you have
>Python (esp. pandas, numpy, some plotting library)
>SQL of some flavour
>Knowing basic Excel can actually help a lot
>depending on what you're actually doing and what industry, a lower-level language like C or Java might be a good idea

Totally depends on industry though.

>>	Anonymous Sun Jul 14 05:19:21 2019 No.10805746 >>10805651 thanks 4 reply have a bump

>>	Anonymous Sun Jul 14 19:59:04 2019 No.10807502 File: 10 KB, 256x256, wes_mckinney.jpg [View same] [iqdb] [saucenao] [google] I pray to Wes every night

>>	Anonymous Sun Jul 14 20:18:16 2019 No.10807551 You don't just read one book though. I'm reading about GLMs, categorical data, time series analysis, survival analysis, Bayesian learning, statistical learning, etc. Etc

>>	Anonymous Sun Jul 14 20:19:49 2019 No.10807558 >>10807551 Any recommendations? I might pick up Shumways time series if it's any good

>>	Anonymous Sun Jul 14 20:42:10 2019 No.10807613 >>10805292 Yes, good motivation plus simple examples. http://sgsa.berkeley.edu/current_students/books/

>>	Anonymous Sun Jul 14 20:45:40 2019 No.10807616 >>10805292 Julia book stats https://people.smp.uq.edu.au/YoniNazarathy/julia-stats/StatisticsWithJulia.pdf

>>	Anonymous Sun Jul 14 21:12:58 2019 No.10807654 >>10805292 Red pill me on Julia?

>>	Anonymous Sun Jul 14 21:16:28 2019 No.10807662 >>10807558 I thought Shumway was overcomplicated. I only read parts of the ARIMA section though

>>	Anonymous Sun Jul 14 21:32:36 2019 No.10807687 >>10805328 >>10805478 What do these books cover? t. time series anon that hasn't posted yet

Anonymous Sun Jul 14 21:41:29 2019 No.10807706

So at work, I've been experimenting with this thing where I apply a smoothing convolution to a time series, forecast the smoothed series, and then deconvolve it back to linear space using Richardson-Lucy. The idea is that some of the noise in the signal might not actually be completely noise, and that it might actually be a predictable signal with some slight variation in how late or how soon it arrives. The convolution smooths it all over so that the "noise" occurs more uniformly in time, and then the deconvolution recovers the original signal.

What does /data/ think?

>>	Anonymous Sun Jul 14 21:45:42 2019 No.10807716 File: 84 KB, 732x924, Data%2C_2366.jpg [View same] [iqdb] [saucenao] [google] This needs to be the OP image of the next thread.

Anonymous Sun Jul 14 21:59:19 2019 No.10807735

>>10805292
Go for Elements if you already have some stats/ML knowledge I'd say.
Also,
>be bioinformatics student
>pretty everything in the subject I'm doing my thesis in is written in R
I've learned to appreciate R but I'd like to take the Julia pill at some point as well.
>>10807551
>>10807558
Seconding this, specifically GLMs and Bayesian learning. Doesn't necessarily have to be books though; notes/reviews would be cool too.

>>	Anonymous Sun Jul 14 22:04:14 2019 No.10807746 Am I a bad data scientist if I don't know shit about statistics, but am areally good with optimization/regression/signals processing?

>>	Anonymous Sun Jul 14 22:04:58 2019 No.10807749 Am I a bad data scientist if I don't know shit about statistics, but am really good with optimization/regression/signals processing?

>>	Anonymous Mon Jul 15 02:13:23 2019 No.10808158 >>10807687 Starts out introducing the field. Then goes through time domain analysis, frequency domain analysis, state space representation+kalman filtering. ARIMA is a part of time domain for instance.

>>	Anonymous Mon Jul 15 02:14:25 2019 No.10808161 >>10807558 Look up agrestis books

>>	Anonymous Mon Jul 15 02:15:52 2019 No.10808165 >>10807749 Haven't you done statistical signal processing?

>>	Anonymous Mon Jul 15 02:18:17 2019 No.10808169 File: 74 KB, 408x634, 1562430535828.png [View same] [iqdb] [saucenao] [google] Show me interesting or shitty stat representations

>>	Anonymous Mon Jul 15 18:55:45 2019 No.10809779 >>10807749 Depends on what you define as "not knowing shit"

>>	Anonymous Mon Jul 15 21:31:53 2019 No.10810127 >>10805292 >Theoretical Statistics: Topics for a Core Course >Mathematical Statistics - Sun Shao >Statistical Inference George Casella and Roger L. Berger Which did it best, anons? Everything else is for engineer retards who can't into proofs

>>	Anonymous Mon Jul 15 21:34:07 2019 No.10810132 >>10809779 I took an introductory stats course in college and got a B-. That was the last stats course I took. Really good at amath, though.

>>	Anonymous Tue Jul 16 01:07:27 2019 No.10810509 Wtf bros I'm a statistician and I'm always cucked from jobs by 'data scientists' Trying to swap now, any advice? I'm still doing my master's in stats but I really need to be employable soon

>>	Anonymous Tue Jul 16 01:47:04 2019 No.10810550 >>10810509 Just learn programming and you'll be just as good or better.

>>	Anonymous Tue Jul 16 03:24:24 2019 No.10810691 Explain ARIMA to me like I'm a retard I don't get how it works and I may need to implement it soon

>>	Anonymous Tue Jul 16 06:52:14 2019 No.10810882 File: 6 KB, 224x225, ohfug.jpg [View same] [iqdb] [saucenao] [google] >>10810691 >not in sklearn

>>	Anonymous Tue Jul 16 20:09:12 2019 No.10812610 bump

>>	Anonymous Tue Jul 16 20:10:54 2019 No.10812616 Today I wrote code to make accurate weekly forecasts years in advance using only ten randomly scattered datapoints over a period of six months. Get on my level, plebs.

>>	Anonymous Wed Jul 17 06:55:06 2019 No.10814120 >>10812616 Forecasts of what, fag

>>	Anonymous Wed Jul 17 06:59:00 2019 No.10814131 >>10814120 How many of each product will be sold within each zipcode of the US.

>>	Anonymous Wed Jul 17 07:02:35 2019 No.10814136 >>10814131 >zero units sold every month because the product is imaginary Genius

>>	Anonymous Wed Jul 17 08:15:36 2019 No.10814225 How much actual statistics do I need to know for a data analyst job? I'm good at kaggle competitions, I know statistical learning theory, but stuff like hypothesis testing goes over my head, haven't bothered.

>>	Anonymous Wed Jul 17 10:05:23 2019 No.10814410 >>10810127 I just downloaded statistical inference and it wasn't as heavy on regression as I was hoping. Do you know any book that covers multivariate regression well?

>>	Anonymous Wed Jul 17 22:17:04 2019 No.10815977 >>10814410 Just learn optimization m8.

>>	Anonymous Wed Jul 17 22:31:24 2019 No.10815996 >>10815977 I just downloaded Convex Optimization by Boyd and Vandenberghe. Will convex optimization help me estimate functions from datasets?

>>	Anonymous Wed Jul 17 22:35:17 2019 No.10816005 OP, ISLR is a great place to start and covers concepts really well. It is EXTREMELY light on math though, so don't think you're getting the full treatment. t. took an actuary exam that used ISLR as part of its material, but went way beyond it in math

Anonymous Wed Jul 17 22:42:18 2019 No.10816014

Hey guys, I'm in a program called SharpestMinds. They connect you with a mentor and plug you into a hiring network.
I'm not here to advertise for them, just saying if you are a pretty well-qualified guy who's knocked out some MOOCs, but you still can't land a job, SM might be what you're looking for.

*Full disclosure you'll have to give them 10% of your first year's salary.*

*Full disclosure you have to find a mentor who will actually take you on, which means passing a coding and stats challenge*

*Full disclosure you should basically be 80% of the way there in qualifications*

However, they don't get paid anything until they get you a job. So it's a cool concept.
Currently they're averaging a new mentee hire every 3 days.

Just to be clear, it is fucking hard to get an entry level DS job if you don't have experience. And even if you get the interview they'll probably keep lobbing technical questions at you until they find a way to trip you up.

Anonymous Wed Jul 17 22:51:22 2019 No.10816028

>>10810509
>doing my master's in stats but I really need to be employable soon

Don't sweat it man. Take a breather. Do some MOOCs for data science and maybe a bootcamp. Yes the bootcamp's expensive but with an MS Stats you can start out 100k as a data scientist. Just look for more stats-oriented roles.
t. was in the same boat as you a year ago

Anonymous Wed Jul 17 23:26:31 2019 No.10816088

>>10816028
Not him but can you get a decent position with just a certificate and no degree? I'm currently a Junior in a B.Sc Materials Science and Engineering program but I'm not being challenged at all and I'm honestly struggling to afford school. To keep it short, I'm looking to make decent money with the certificate (~$60k) to pay off debt, finish undergrad, and then research AI applications to Mat Sci.

Anonymous Wed Jul 17 23:38:40 2019 No.10816097

>>10816088
No man, to get a data science job you often need a grad STEM degree.
I'm not saying it's absolutely necessary, I've known some good data scientists with just a bachelor's in marketing. But employers won't take you seriously without the bachelor's.
If I were you I'd just lighten the courseload and go work in a restaurant bro. You're not in any rush and sometimes doing that a few years can do you some good.

Anonymous Thu Jul 18 00:28:33 2019 No.10816175

>>10816097
Well, what if it's a bachelor's in time away status? I figured the biggest push for any applications I put forward would be my projects rather than my certificate; will they really trash my application in spite of my projects simply because I don't have a degree?

Anonymous Thu Jul 18 01:20:58 2019 No.10816263

>>10814225
Very little stats, from experience it's mostly data cleaning. Stuff like hypothesis testing will hopefully come with practice, it's unintuitive and only makes sense if you understand data generation processes (i.e. where the test statistic comes from). A first course in probability should give you that knowledge. After that, you can push further by studying decision theory and stuff.

>>	Anonymous Thu Jul 18 02:16:02 2019 No.10816357 >>10815996 It's useful for finding the maximum likelihood estimator in the convex case.

>>	Anonymous Thu Jul 18 02:30:01 2019 No.10816369 >>10816357 That's not what I need. I intend to learn convex optimization eventually so it's still nice to have the book I guess.

>>	Anonymous Thu Jul 18 02:35:31 2019 No.10816376 >>10816369 MLE is used for time series analysis, but you aren't doing that?

>>	Anonymous Thu Jul 18 10:05:04 2019 No.10816930 >>10805292 bump

>>	Anonymous Thu Jul 18 10:20:41 2019 No.10816942 Can someone intuitively explain how lasso and ridge differs from normal linear regression? I know that they use some absolute/squared penalty but what exactly does it penalize.

Anonymous Thu Jul 18 11:43:00 2019 No.10817010

>>10816376
It's more that my data isn't convex unless I fuck with it or add another dimension, but maybe I'm misunderstanding. The dataset that I'm trying to work with is five dimensional and some of the dimensions are pretty sparce. I was going to try to use Bayesian regression or whatever looked most suitable after I had learned enough statistics.

Anonymous Thu Jul 18 12:18:01 2019 No.10817052

>>10816942
Yeah man.
Ridge and Lasso introduce a penalty on the values of the linear regression coefficients, and try to shrink certain coefficient values in order to reduce the RMSE. Ridge uses the L2-norm and Lasso uses the L1-norm.

Lasso can completely eliminate irrelevant coefficients, ridge can only make them very small.

It's covered quite well in the OP book actually (which is free, just google ISLR)
This guy has a bunch of good videos
https://www.youtube.com/watch?v=5asL5Eq2x0A

>>	Anonymous Thu Jul 18 21:17:56 2019 No.10818242 >>10817010 >my data isn't convex unless I fuck with it or add another dimension splain

>>	Anonymous Thu Jul 18 21:54:18 2019 No.10818294 >>10818242 One of the dimensions strictly increases over time (with a few exceptions) and the rest are dimensions of time

>>	Anonymous Thu Jul 18 22:09:08 2019 No.10818317 File: 113 KB, 356x249, I_don't_need_it.png [View same] [iqdb] [saucenao] [google] >>10810882 fuggg

>>	Anonymous Fri Jul 19 00:41:23 2019 No.10818598 >>10810550 Programming is like literacy in a bunch of fields these days.

>>	Anonymous Fri Jul 19 08:46:37 2019 No.10819140 >>10818294 Show us an example.

Anonymous Fri Jul 19 09:00:01 2019 No.10819150

i did a data science masters.
my job title is data scientist.
but all i ever do at work is descriptive statistics (which a highschooler could do).
And i am not allowed to use python. I have to use all these gayass proprietary languages that nobody else uses.

FUG

>>	Anonymous Fri Jul 19 10:31:54 2019 No.10819262 >>10810691 Start with learning ARCH and GARCH models, do a few calculations by hand. Then ARIMA will begin to make sense!

Anonymous Fri Jul 19 12:06:23 2019 No.10819383

>>10819140
I don't have one handy. I want to predict gross sales as part of some scheduling software I'm trying to make. I'm starting by breaking the time dimension up in a way that I think will make the data smoother and easier to predict, specifically as year, day of the year, day of the week, and hours. I haven't learned enough yet to know if this approach is flawed or invalid, but I figured that representing the data this way would better for dealing with holidays and outliers. How far off am I, and what books do I need to read?

>>	Anonymous Sat Jul 20 05:19:01 2019 No.10821230 >>10805292 Nice

>>	Anonymous Sat Jul 20 05:54:07 2019 No.10821291 File: 137 KB, 798x1200, phenotype.jpg [View same] [iqdb] [saucenao] [google] >>10805292 >Daniela Witten

Anonymous Sat Jul 20 07:16:37 2019 No.10821417
File: 23 KB, 692x526, 1559334643687.jpg [View same] [iqdb] [saucenao] [google]

>working around codemonkeys
>don't really know what I'm doing
>use the phrase "linear regression"
>they all act super impressed like it's some kind of AI wizardry
So this is the true power of data science

>>	Anonymous Sat Jul 20 14:17:57 2019 No.10822208 >>10821417 My manager once asked me to do linear regression for a classification problem. 90% in professional jobs are retarded which is why the data science meme is possible

>>	Anonymous Sat Jul 20 14:18:58 2019 No.10822212 >>10819150 Are you using SAS?

Anonymous Sat Jul 20 14:30:48 2019 No.10822236

Hey anons. Looks like I'll be implementing a conv net in Keras pretty soon, working off an architecture described in a paper. I'll have help from my lead, who has more experience with this kind if thing. Personally I've worked with neural nets a good bit, but never from the ground-up like this. Any pointers?

>>	Anonymous Sat Jul 20 17:31:03 2019 No.10822835 >>10822236 Hyperparameter tuning is your friend. You can design the world's best model, but it won't do anything if you give it the wrong hyperparameter values. Use this https://scikit-optimize.github.io/#skopt.gp_minimize

>>	Anonymous Sat Jul 20 20:23:37 2019 No.10823331 What is the job title of someone who develops software systems and implements/integrates ml components into them?

>>	Anonymous Sun Jul 21 03:20:51 2019 No.10824130 >>10822835 How does it compare to RandomizedSearchCV for example? I use that to narrow down the choices before a small GridSearchCV. How do you guys find parameters?

>>	Anonymous Sun Jul 21 08:57:32 2019 No.10824582 >>10824130 It finds the optimum after ~20 samples instead of 200+ samples.

>>	Anonymous Sun Jul 21 13:14:54 2019 No.10825117 File: 171 KB, 618x380, 18437217314.jpg [View same] [iqdb] [saucenao] [google] rec me some book on probability and statistics for morons that doesn't bring in american textbook style 800 page bullshit, their own applets and whatnot

>>	Anonymous Sun Jul 21 13:16:23 2019 No.10825125 >>10805292 I've read that one in class. It was decent.

>>	Anonymous Sun Jul 21 13:19:25 2019 No.10825134 >>10816088 check out citrine.io, they're a big name in your area of interest.

>>	Anonymous Sun Jul 21 14:10:48 2019 No.10825298 File: 22 KB, 348x499, r.jpg [View same] [iqdb] [saucenao] [google] >>10825117 I found this was a good practical book. Not too rigorous and gives you a lot of practice with applications with datasets.

>>	Anonymous Sun Jul 21 15:22:00 2019 No.10825495 >>10823331 Machine learning engineer my dude. Pay is good.

>>	Anonymous Sun Jul 21 15:25:11 2019 No.10825503 >>10825298 this made my balls shrivel up

>>	Anonymous Sun Jul 21 22:03:34 2019 No.10826320 >>10825503 >this made my balls shrivel up what does this even mean?

>>	Anonymous Sun Jul 21 22:04:48 2019 No.10826324 >>10826320 programming and stats are against life itself

>>	Anonymous Sun Jul 21 22:07:55 2019 No.10826327 >>10826324 programming + stats = data science

>>	Anonymous Mon Jul 22 05:26:46 2019 No.10827093 File: 56 KB, 768x960, 714827381.jpg [View same] [iqdb] [saucenao] [google] >>10825298 >start reading >babby language >check website >department of biological sciences

>>	Anonymous Mon Jul 22 05:32:28 2019 No.10827099 >julia Stop using meme languages

>>	Anonymous Mon Jul 22 06:14:50 2019 No.10827154 >>10827099 Start using good languages.

>>	Anonymous Mon Jul 22 06:25:51 2019 No.10827166 >>10826327 yes that’s right >>10827154

>>	Anonymous Mon Jul 22 06:58:28 2019 No.10827201 >>10819383 You are on step one. About a hundred more to go from there. Good luck !

>>	Anonymous Mon Jul 22 11:19:44 2019 No.10827665 >>10827201 That's not helpful

>>	Anonymous Mon Jul 22 11:23:30 2019 No.10827672 >>10807616 >UQ m8

>>	Anonymous Mon Jul 22 13:28:03 2019 No.10827893 >>10810127 Casella or Mood. >>10814410 Seber, Lee. Or Applied Multivariate Statistical Analysis by Johnson and Wichern.

>>	Anonymous Mon Jul 22 14:59:57 2019 No.10828041 >>10827893 >Applied Multivariate Statistical Analysis by Johnson and Wichern This is excellent. Thanks.

>>	Anonymous Mon Jul 22 16:52:25 2019 No.10828276 >>10827093 What did you expect it’s an intro book for non-stats students who want to pick up just enough data analysis for their own studies.

>>	Anonymous Mon Jul 22 16:59:01 2019 No.10828300 >>10828276 >pick up just enough data analysis for their own studies This is the reason why psychology is a meme science with <30% reproducible results.

>>	Anonymous Mon Jul 22 17:47:47 2019 No.10828420 >>10817052 You're retarded. Ridge and Lasso regularization have no correlation with RMSE. They're correlated with a reduction in variance.

>>	Anonymous Mon Jul 22 17:57:45 2019 No.10828451 >>10828300 dilate

>>	Anonymous Mon Jul 22 18:41:21 2019 No.10828533 >>10828420 they reduce rmse on the test set

>>	Anonymous Mon Jul 22 21:06:58 2019 No.10828897 >>10828533 that's not necessarily true. if you apply l1/l2 norm on an already underfit dataset, you're not going to get a lower rmse

>>	Anonymous Mon Jul 22 21:18:03 2019 No.10828942 >>10828451 cope harder

>>	Anonymous Tue Jul 23 02:44:38 2019 No.10829580 >>10828533 Why do you think penalized terms in objective function would reduce rmse?

>>	Anonymous Tue Jul 23 06:51:11 2019 No.10829884 The virgin statistician vs the chad optimization expert.

>>	Anonymous Tue Jul 23 10:05:37 2019 No.10830177 I just finished reading this book, took me about 1 ½ months without doing the exercises. I think I understood 60 - 65 % of what I read, does this make me too brainlet for ML/stats?

Anonymous Tue Jul 23 16:08:23 2019 No.10830995

>>10819383
Not sure about the nature of your problem but if we are talking about sales that can be made 24/7 (i.e. online website sales) then it might be a good idea to warp the intraday time dimension so that 11:55PM and 01:30AM do not have a larger distance between them than say 13:30PM and 15:05PM. That is of course if the optimisation method of your choice depends on a distance metric. The warping may be done by engineering a time feature through polar coordinate transformation.

Anonymous Tue Jul 23 16:15:08 2019 No.10831019

>>10822208
It will take years for management to catch up to data science concepts - if they ever do. There are some that try it through courses aimed at managers (you see more and more schools cashing in on it, like MIT with their "AI" course). Ultimately though, it will remain the task of the data scientist to adequately and interestingly "pitch" an idea to management such that they understand it and think it makes them look fancy at the same time. This is not easy to do. You can usually distinguish the quality of a data scientist by his/her ability to be able to explain advanced concepts on a retard level.

Anonymous Tue Jul 23 16:28:22 2019 No.10831043
File: 50 KB, 381x500, 51vPMQ3gJWL.jpg [View same] [iqdb] [saucenao] [google]

I know this book gets some hate from data scientists but if you are new to the ML field and want to get a light-weight, easy to read, hands-on and relatively wide overview of it, I can really recommend it. If someone asks what books to read as a beginner to ML, this is always at the top of my list. Everything else goes from there if you wish to deep dive into individual topics / math / models.

Anonymous Tue Jul 23 16:53:54 2019 No.10831085

Does anyone have experience with unsupervised machine learning and what has been your mental state after this exposure? I am currently applying unsupervised anomaly detection at work and I have to say the combination of unsupervised + anomaly detection in a low signal to noise environment just drains the life out of you - it's beyond soul crushing.

>>	Anonymous Tue Jul 23 17:23:57 2019 No.10831155 >>10828897 of course thats why you do proper model/hyperparameter selection >>10829580 can reduce overfitting

Anonymous Tue Jul 23 17:31:31 2019 No.10831175

How in the everloving fuck can anyone like data engineering. Cleaning data is the most mindless and soul-crushing programming experience I've had ever.

Mathematical modelling and comparing model types is great fun, but sorting through shitty data is a nightmare. How does /data/ get through it?

>>	Anonymous Tue Jul 23 17:35:34 2019 No.10831182 >>10831175 I guess every job has its downside elements - data handling and cleaning is definitely one of them in the data science / ML field.

>>	Anonymous Tue Jul 23 18:09:47 2019 No.10831247 Senior undergrad statistics major here, been messing around with ML in R for a while. Anyone know any good ways to make a little extra cash with ML?

Anonymous Tue Jul 23 20:38:13 2019 No.10831578

>>10830995
>if we are talking about sales that can be made 24/7 (i.e. online website sales) then it might be a good idea to warp the intraday time dimension so that 11:55PM and 01:30AM do not have a larger distance between them than say 13:30PM and 15:05PM
It's not 24/7, but that's an interesting consideration. I probably want to do that for the weekday and day of the year. Are there any other considerations I might have missed?

>>	Anonymous Tue Jul 23 21:25:41 2019 No.10831645 >>10830177 If you do the excercises you'll understand 90-100%. Work it bitch

Anonymous Wed Jul 24 10:17:31 2019 No.10832860

>>10831175
Some of the database and automation stuff is somewhat interesting, and setting up a fully automated data pipeline is pretty satisfying

In addition, the Data Scientists who understand your work are your customers rather than a load of salespeople or management who don't have a clue

But it seems like a pretty thankless job tbqh

>>	Anonymous Wed Jul 24 10:21:23 2019 No.10832866 Can someone actually explain the advantages of Julia over R without memeing?

>>	Anonymous Wed Jul 24 10:52:45 2019 No.10832916 >>10832866 SUUPA SUPEEDO Also nice REPL

>>	Anonymous Wed Jul 24 12:35:14 2019 No.10833044 >>10832916 And here i thought R was already quite quick with tabular data, linear algebra, inverting matrices etc. Always seemed faster than Python to me

Anonymous Wed Jul 24 12:41:59 2019 No.10833056

>>10833044
Tbqhwu it depends what you're doing

Julia can be a little annoying in that you have to compile packages before use, but once you do it's fast as fuck

So it depends how much speed you want. Languages like Q are designed for use cases where speed is the only consideration, R is for when you don't give a fuck about speed

>>	Anonymous Wed Jul 24 16:05:50 2019 No.10833426 >>10805292 Sysadmin with experience in Python here. Where do I start? I tried getting into ML and datascience using a crashcourse made by Google. But it hard to follow very quickly.

>>	Anonymous Wed Jul 24 20:06:21 2019 No.10834224 what is "data science" Is it related to CS

>>	Anonymous Wed Jul 24 20:13:12 2019 No.10834235 >>10834224 >Is it related to CS no.

>>	Anonymous Wed Jul 24 20:22:16 2019 No.10834255 >>10834224 Nah HR cunts think it is though

>>	Anonymous Wed Jul 24 23:33:07 2019 No.10834824 Is professor Han at UIUC good? He has a book with 10000+ citations, but is he good at instructing?

>>	Anonymous Thu Jul 25 00:35:13 2019 No.10834917 >>10831175 they get paid pretty well. t. a data engineer who makes more than the data scientists at my company

>>	Anonymous Thu Jul 25 02:48:59 2019 No.10835104 >>10834917 what's the difference between the titles?

>>	Anonymous Thu Jul 25 07:13:11 2019 No.10835443 >>10835104 Data engineers just store things. They don't really interact with the data at all.

>>	Anonymous Thu Jul 25 21:12:02 2019 No.10837114 What do you do when your new job has spent millions of dollars on a new project they hired you to lead, only for you to discover that they didn't really know what they were doing, so they built something totally useless?

>>	Anonymous Thu Jul 25 21:38:38 2019 No.10837197 Fucking spent hours working on this webscraper It turns out beautifulsoup wont work if there’s a slash / on the end of this address. Anyone doing a data science project? I’m scraping property data from a public website

Anonymous Thu Jul 25 21:40:18 2019 No.10837201

>>10831175
It’s not just data cleaning, it’s a lot of pipelining data around. And at the end of the day that’s pure computer science/software engineering
So there are guys with CS PhDs who work with Luigi and Airflow and stuff just setting up machine learning jobs and making sure it all runs

Anonymous Fri Jul 26 00:15:57 2019 No.10837678

>>10837197
are you using beautifulsoup to fetch the page?
the way i usually use it is use another http client to get the HTML as a string then pass that to beautifulsoup to do the parsing, it shouldn't care about the address that way except for resolving links on the page

>>	Anonymous Fri Jul 26 06:54:25 2019 No.10838206 Is linear programming /data/?

Anonymous Fri Jul 26 13:08:15 2019 No.10838745
File: 13 KB, 160x373, snap.jpg [View same] [iqdb] [saucenao] [google]

>>10837678
I use requests to fetch the page then parse it with beautifulsoup

Right now I'm making a separate .json file for every property in the database, and there are about 250k. I'm gonna have to run this over multiple nights to fetch all the data.

In the meantime I'll do a little EDA of the little observations I have.

Anonymous Fri Jul 26 19:04:17 2019 No.10839571
File: 211 KB, 664x701, 1546105196146.jpg [View same] [iqdb] [saucenao] [google]

Just started my MSc Mathematics & Statistics. I'm studying it while I work as a Data Analyst (SQL, Python kinda stuff). Coming from a B. IT (CS but with no Math) hoping this will help me elevate my career to the next level.

Is this is the right move? I turned down a Master of Data Science because I felt it was a bit of a meme degree.

>>	Anonymous Fri Jul 26 19:19:06 2019 No.10839606 >>10839571 Depends on what type of work you'll be doing during your msc

>>	Anonymous Fri Jul 26 19:37:27 2019 No.10839659 >>10839606 It's MSc by coursework. The topics are pretty much a Bachelor of Mathematics with Applied stats focus but with room for directed courses which i'm going to choose the ML courses to fill in.

>>	Anonymous Fri Jul 26 20:56:41 2019 No.10839822 >>10805292 The startup im working in is growing quite fast, it's clear that in one year or less the tools we are currently using for data analysis won't be able to keep up. So what tools do you guys use? We use mainly BigQuery

>>	Anonymous Fri Jul 26 21:56:15 2019 No.10839908 File: 859 KB, 1296x797, mathMemes.png [View same] [iqdb] [saucenao] [google] >>10831043 If one would do some of the projects on there can I land myself ML software engy or data scientist roles (Assuming BS in CS) Also, >inb4 CS Meme shitposting

>>	Anonymous Sat Jul 27 03:46:37 2019 No.10840502 Is there a way to apply attention mechanism to outputs of multiple LSTM encoders but combine it into a single LSTM decoder?

>>	Anonymous Sat Jul 27 03:54:21 2019 No.10840520 >>10839571 >I turned down a Master of Data Science because I felt it was a bit of a meme degree. >he actually listens to wha the NEETs on 4chan say Loser

>>	Anonymous Sat Jul 27 03:58:20 2019 No.10840527 >>10805292 >Fuck R No, fuck you

>>	Anonymous Sat Jul 27 04:02:24 2019 No.10840537 File: 6 KB, 403x178, application.png [View same] [iqdb] [saucenao] [google] >he wants to land a job in data science just fking LOL

>>	Anonymous Sat Jul 27 04:02:39 2019 No.10840539 >>10840527 this desu

Anonymous Sat Jul 27 04:03:04 2019 No.10840544

>>10831175
You're right that a lot of it can be soul crushing data munging. I much prefer the data science side of it, and still do it as often as I can, but unfortunately data engineering pays better.

I absolutely hate database modeling and SQL, but actually designing and building data pipelines can be fun. My statistical learning is pretty solid, but I need to get better at the hardcore ML deep learning stuff, and hopefully I can pivot into ML/AI Engineer of some sort

Anonymous Sat Jul 27 04:05:38 2019 No.10840551

>>10833056
Fuck base Julia and base R for a quick sec.

Consider R in the context of Tidyverse, and the breadth of robust and well performing packages available for stuff from PCAs, to nonlinear modeling, and network analysis, how the fuck can anyone claim that Julia is better than R?

>>	Anonymous Sat Jul 27 05:21:15 2019 No.10840633 >>10837197 You mean Requests? BS parses HTML responses, you shouldn't need the address

Anonymous Sat Jul 27 05:55:08 2019 No.10840671

I'm working on time series analysis and I need to find a transformation that maps sequences of various lengths to the same length. Tried Fourier transform already, seems to work to a certain degree. Dynamic time warping is not an option, my boss will rip my head off and shit in my throat if I suggest this one more time (and he is probably right).
Do you maybe have an idea?

>>	Anonymous Sat Jul 27 07:09:16 2019 No.10840788 >>10840551 They're claiming base Julia is better than base R, the fact that R and Python have better ecosystems is why Julia isn't that usable yet

>>	Anonymous Sat Jul 27 09:25:46 2019 No.10841030 >>10840671 What's wrong with DTW? Without knowing that, I'm not sure how to help.

>>	Anonymous Sat Jul 27 09:47:24 2019 No.10841071 >>10841030 The data in between the recognized correspondence points can't be used.

>>	Anonymous Sat Jul 27 09:50:04 2019 No.10841078 >>10841071 So only certain subsequence are correlated? Can you post a graph?

Anonymous Sat Jul 27 09:56:16 2019 No.10841086

>>10841078
Unfortunately I can't post a graph because everything is in the company network. They don't allow me to take the laptop with me and I'm 2 hours away from my work place.

>So only certain subsequence are correlated?
Exactly! One subsequence is like 1% longer than the other. Not a big deal, but I can't feed that to machine learning models. I can't even do point-wise comparisons.

Anonymous Sat Jul 27 14:32:49 2019 No.10841582

>>10840788
But in the modern age of programming, it doesn't even make sense to consider a language outside of its ecosystem. Base Julia may be faster than base Python or base R, but things like Pandas, numpy and Tidyverse are C optimized, and Julia isn't that much or isn't faster than a given C subroutine.

There is utility in the entire world speaking English in the same way that there could be utility if we all shared the same programming languages (ie. growth in ecosystems). There's always ways to optimize and improve programming, but Julia offers nothing that makes it revolutionary to me outside "its faster".

Anonymous Sat Jul 27 15:34:44 2019 No.10841719

>>10841582
There's an enormous cost to jumping back and forth between interpreted and compiled code multiple times within a single line. You will never get "native performance" with something like numpy. You'll just get performance that isn't absolute dogshit.

Anonymous Sat Jul 27 15:36:20 2019 No.10841729

>>10841582
>Julia offers nothing that makes it revolutionary to me outside "its faster".
The entire language has end-to-end autodifferentiation. You can write an arbitrary piece of code and turn it into a differentiable model, loops, if statements, and all.

>>	Anonymous Sat Jul 27 16:17:18 2019 No.10841811 >>10840551 >>10840788 >>10841582 Will R die once other languages start to get the same amount of packages?

>>	Anonymous Sat Jul 27 16:22:04 2019 No.10841826 >>10841811 R is already dead.

>>	Anonymous Sat Jul 27 16:31:03 2019 No.10841844 >>10841826 Any sources for this? I'm a biofag who spent the last 5 months using R, it's even better than my python now. I don't want all these R-skills to be for naught bros.

>>	Anonymous Sat Jul 27 16:33:48 2019 No.10841851 >>10841844 this please help me I don’t want to learn another language

>>	Anonymous Sat Jul 27 16:57:48 2019 No.10841897 >>10841851 >>10841844 >>10841826 >>10841811 Data scientist in silicon valley, R is nowhere close to dying. The R ecosystem has improved tremendously even in the past 2 years, as has the cross-communication between R and Python.

>>	Anonymous Sat Jul 27 22:56:27 2019 No.10842621 >>10840520 Really depends on the program. I've seen some people with masters of data science from a degree mill and their coursework was literally in excel and power bi, no real technical meat to it.

>>	Anonymous Sun Jul 28 03:35:08 2019 No.10843006 >>10841811 Naa, R is a specialized tool for doing stats aimed at non-programmers, so it will always be relevant.

>>	Anonymous Sun Jul 28 04:11:23 2019 No.10843051 File: 166 KB, 884x1364, 71hX4xNc9NL.jpg [View same] [iqdb] [saucenao] [google] Anyone wanting to share some experiences with this?

Anonymous Sun Jul 28 04:14:31 2019 No.10843059

If you want to be a BIG DATA code monkey like the pajeets you barely even need An Introduction to Statistical Learning since most of the major tools are now bundled kits that you can run out of the box.
If you want to actually LEARN something you pick up statistics books. Machine Learning was literally a solved field in Statistics in the 70s

>>	Anonymous Sun Jul 28 19:32:21 2019 No.10844957 >>10843059 >Machine Learning was literally a solved field in Statistics in the 70s Imagine being this clueless.

>>	Anonymous Sun Jul 28 20:01:42 2019 No.10845040 how is this thread still going - what did I miss?

>>	Anonymous Sun Jul 28 21:09:33 2019 No.10845217 >>10845040 Not much. Juliafags screeching "no u" and some "bros, how do i x?" posts.

>>	Anonymous Sun Jul 28 21:23:41 2019 No.10845260 My college offers a "master’s in applied AI” from the electrical engineering department. Is it worth it?

>>	Anonymous Sun Jul 28 22:24:28 2019 No.10845403 >>10845260 Depends on your skills and interests, what you would do otherwise, as well as the college. Probably though

>>	Anonymous Mon Jul 29 00:00:33 2019 No.10845572 >>10845260 How is anybody fucking supposed to know that? Take a look at the curriculum and decide for yourself. That being said, the fact that they're using the word AI instead of applied statistics or applied machine learning is a danger sign.

Anonymous Mon Jul 29 00:25:16 2019 No.10845601

I'm testing out using an unscented kalman filter using the filterpy library for python. Wondering if the pykalman library is better here.

Regarding the design of my filter, I am trying to figure out 2D position from a mix of 1D velocity, angle, angle rate of change, and 2D acceleration data. The book I read said that more data is always better, but I'm having a hard time tuning the filter in a way that it doesn't diverge quickly. I am feeding it data at 500HZ and using RTS smoothing on the filtered data. Is it too tedious to tune the filter with this much data, or is more data always a good thing?

Anonymous Mon Jul 29 00:26:32 2019 No.10845602

>>10843059
I wonder if this is a troll or if you're really this uninformed.

>>10845260
I think it would get you a job in data science.
However if I were you I would choose something more theoretical or science-heavy for grad school.
If you have a related STEM background, you just need some MOOCs and motivation to learn data science. But if you go to school for just data science it kind of narrows your horizons.
If you have a good background in generalized linear models and statistics you are 80% of the way to understanding most machine learning algorithms.

>>	Anonymous Mon Jul 29 00:27:33 2019 No.10845604 >>10843051 Monitoring, looks interesting as fuck actually

>>	Anonymous Mon Jul 29 01:23:19 2019 No.10845675 >>10840551 >Fuck base Julia and base R for a quick sec >"uuh, ignoring the fact that Julia is inherently better..." R and python are garbage and only survive because of brainlets and boomers

>>	Anonymous Mon Jul 29 09:39:01 2019 No.10846348 >>10843059 Data science is more optimization than statistics. Git gud.

>>	Anonymous Mon Jul 29 10:07:22 2019 No.10846412 Looking for an IQ (Can be ASVAB or old SAT) to Income dataset to do some testing on. Thanks.

Anonymous Mon Jul 29 13:44:27 2019 No.10846898

How good is the book in the OP? I'm taking an Introduction to Data Science and Machine Learning course this fall but the majority of the course will be based on the lecture notes. They have that book as a reference and I'm wondering if it is worth the read.

>>	Anonymous Mon Jul 29 13:54:51 2019 No.10846923 >>10846898 Very easy, short and free book Perfect introduction non-pop sci. https://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf

>>	Anonymous Mon Jul 29 16:31:45 2019 No.10847247 >>10834224 statistical analysis of stochastic time series

>>	Anonymous Mon Jul 29 20:15:58 2019 No.10847714 >>10846412 Ask /pol/.

>>	Anonymous Mon Jul 29 20:25:56 2019 No.10847731 hi anons I am actuarial science graduate, I want to start on the path of data science using python, what books and courses you recommend? I prefer books because I get bored watching videos, but please do recommend if you got a good course. thank

>>	Anonymous Mon Jul 29 20:29:57 2019 No.10847735 >>10805292 Read it for a graduate class at UT Austin. Personally, I just didn't like it. Andrew Ng's course notes were clearer, more concise.

>>	Anonymous Mon Jul 29 20:43:36 2019 No.10847764 >>10816014 10%, big ooeff mr goldstein, thats a rough cut of the top.

Anonymous Tue Jul 30 00:50:02 2019 No.10848314

>>10847731
dataquest.io is good, it has no videos.

Honestly if you want to get handy with Python and data science, you're going to down a long road and you're gonna need lots of practice. So you need to get up and code everyday on different sites like Udacity, dataquest, datacamp, edabit, etc.

I did 3 actuary exams myself. If you took SRM you should have a great understanding of data science topics. If you haven't taken SRM, take it.

>>	Anonymous Tue Jul 30 03:21:29 2019 No.10848599 >>10841826 Nope. Matlab is dead. R is the future.

>>	Anonymous Tue Jul 30 08:02:08 2019 No.10848939 >>10841897 >>10841851 >>10841826 >>10848599 Statistics masters student here, saying R is dead is like saying Java is dead.

Anonymous Tue Jul 30 09:22:01 2019 No.10849073

>>10843051
Introductory Econometrics by Woolridge isn't bad, that's what we used at my uni for econometrics and it'll teach you the basics of regression, time series etc. While it has a fair bit of math, it explains it pretty well. Plus you can get it as well as the datasets used for free online after 2 Google searches.

The data comes in stata format but python+pandas or R can read it easily. I believe that there's actually a python module which contains the data sets so you can import whatever you want right in like in scikit.

>>	Anonymous Tue Jul 30 09:24:22 2019 No.10849076 >>10847735 >UT Austin ayyyyy

Anonymous Tue Jul 30 10:09:33 2019 No.10849160

Say I'm working on a non-convex optimization problem and I want to try some regularization, how do I compute the norm of a function in its reproducing kernel Hilbert space? How do I select a proper kernel?
This is kinda out of my expertise area and I'm having some trouble to figure it out.

>>	Anonymous Tue Jul 30 14:20:17 2019 No.10849673 File: 1.87 MB, 331x197, 1558373551380.gif [View same] [iqdb] [saucenao] [google] >>10841582 >tidyverse is C optimized

>>	Anonymous Tue Jul 30 14:22:16 2019 No.10849681 >>10843059 Top tier brainlet post. Go read Pattern Recognition and Machine Learning

Anonymous Tue Jul 30 16:30:20 2019 No.10850034

Please help me, I think I'm gunna get fired...

I have taken one intro to statistics in undergrad. No one in my research group is a statistician. I hope we hire an epidemiologist in the future.

I have the number of injuries at 15 hospitals (1 of the hospitals is the "hospital of interest"), as well as the corresponding demographic data of each injured patient (age, gender, date of injury, cause of injury, level of injury, date of death). These are the only hospitals in the country, and each serves a known region with a known age- and sex-stratified population (info I can get from gov stats website).

My objectives are as follows:

(1) get a general picture of the situation in terms of age, gender, cause of injury, level of injury at (a) each hospital and (b) nationwide for all 15 hospitals combined.

(2) compare the hospital of interest's age, gender, cause of injury, level of injury with (a) each hospital and (b) nationwide for all 15 hospitals combined.

I have 6 years of data, so I was thinking of looking at the variables annually. Aside from this I'm really out of my element here and I've been reading similar studies (though there really aren't that many) to learn more.

Any help/suggestions is appreciated.

>>	Anonymous Tue Jul 30 16:40:41 2019 No.10850065 Holy shit this thread reads like a math/computerscience drop out meeting. Lmfao. Imagine thinking you are smart because you know Python, Statistics (lol) and linear algebra lmfao

>>	Anonymous Tue Jul 30 16:51:14 2019 No.10850103 >>10850065 you have no idea how easy it is to impress managers with something as simple as PCA.

Anonymous Tue Jul 30 16:53:29 2019 No.10850107

>>10850103
The insane part is, 10 years ago people who wrote in Python were seen as literal garbage. I can't understand why that changed.

I think there are just too few people left that actually know how to code , computer science and math so that all the drop outs and losers with no math background can only into Python and easy unoptimized shit.

Anonymous Tue Jul 30 17:02:36 2019 No.10850139

Book in OP:

> "This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist."

> non mathematical sciences

HAHAHAHAH holy shit. how deep do you have to fall

Anonymous Tue Jul 30 17:11:34 2019 No.10850167

>>10850107
To be fair, many applied fields require fast prototyping of ideas which Python is particularly well-suited for. Programmers for compiled languages usually enter the stage when an idea has been completely fleshed out and ready for implementation.

On the other hand, Python can be twice as powerful under a user that actually knows the benefits and limitations of i.e. many of the tools available in Scikit-Learn. You can easily spot the difference between someone who knows a bit of Python, has done some Coursera courses, and can apply 5-liners found on some blog post, vs somebody who can actually build custom pipelines for a project at hand, make changes to base estimator classes etc.

>>	Anonymous Tue Jul 30 17:18:08 2019 No.10850187 >>10850167 Yes, that difference is 10 hours at most. This has to be a joke.

>>	Anonymous Tue Jul 30 22:28:50 2019 No.10850958 >>10849160 What in the actual fuck are you working on?

>>	Anonymous Tue Jul 30 22:37:48 2019 No.10850986 >>10850958 Your mom

>>	Anonymous Tue Jul 30 22:38:58 2019 No.10850992 >>10805292 >general >no tips on how to start studying the subject Come on, OP.

>>	Anonymous Tue Jul 30 22:44:10 2019 No.10851004 >>10850992 I don't even know where to start. I'm currently doing an MSc in Applied Math.

>>	Anonymous Tue Jul 30 22:56:04 2019 No.10851039 >>10850958 a non-convex optimization problem

>>	Anonymous Wed Jul 31 03:41:32 2019 No.10851604 >>10850034 this seems easy, did you get in a job you don't know anything about?

>>	Anonymous Wed Jul 31 03:43:06 2019 No.10851605 >>10850139 undergrad and master students, (in mathematical sciences) AND phd's in non-mathematical sciences.

>>	Anonymous Wed Jul 31 03:44:23 2019 No.10851608 >>10850992 but he posted it.

Anonymous Wed Jul 31 06:15:07 2019 No.10851746

Aussie here, currently working as a BI Dev and have a B.IT. My degree didn't cover any required maths but voluntarily took one base level Calc & Algebra course and really liked it. I wanna do a Masters to elevate myself above the competition in searching for better jobs and hopefully fill in some gaps from my prev. degree. Help me choose bros:

Master of Data Science from UNSW (top aussie uni)
(AUD $48,000)
https://studyonline.unsw.edu.au/online-programs/master-data-science

Master of Computer Science from Georgia Tech (online) (AUD $11,000)
http://www.omscs.gatech.edu/home

Master of Computer Science from University of Illinois (online) (AUD $30,800)
https://www.coursera.org/degrees/master-of-computer-science-illinois

Master of Computer Science from Arizona State University (online) (AUD $22,000)
https://www.coursera.org/degrees/master-of-computer-science-asu

>>	Anonymous Wed Jul 31 06:25:34 2019 No.10851755 >>10851746 Georgia Tech is the cheapest option and it's among the top engineering/CS schools in the U.S.

>>	Anonymous Wed Jul 31 07:07:15 2019 No.10851788 >>10851039 Regularization is for regression. It should not be used if you're just solving a general optimization problem that isn't related to training a model.

>>	Anonymous Wed Jul 31 08:01:41 2019 No.10851849 >>10851605 >>10851605 that's not what it says

Anonymous Wed Jul 31 09:53:02 2019 No.10852038

>>10851605
Ever heard of Oxford comma?

>"I saw a dog, a horse and a cat cross the street."
Means: "I saw three animals cross the street; a dog, a horse and a cat."

>"I saw a dog, a horse, and a cat cross the street."
Means: "I saw a cat cross the street. I also saw a dog and a horse."

There's no Oxford comma in >>10850139 so it literally means that the book is aimed for students in non-mathematical sciences.

>>	Anonymous Wed Jul 31 10:02:18 2019 No.10852054 >>10850034 Basic exploratory data analysis will solve half of that. learn2pandas + use the pandas-profiling library in a jupyter notebook and you can find all that out within an hour or 2.

>>	Anonymous Wed Jul 31 10:07:52 2019 No.10852063 >>10852038 Oh no an introductory maths book to help people

Anonymous Wed Jul 31 10:34:10 2019 No.10852111

>>10851788
Why not? Regression boils down to an optimization problem, and isn't "training a model' just parameter estimation? My problem looks a lot like non-linear least squares.
Anyway, I don't think this discussion is relevant to my original question about RKHS.

>>	Anonymous Wed Jul 31 10:35:38 2019 No.10852116 >>10852063 > oh no data science is literally a meme for people who failed at math and CS

>>	Anonymous Wed Jul 31 10:39:02 2019 No.10852126 >>10851788 >>10852111 Also regularization is literally constraints handled with lagrange multipliers, except the multiplier is a tuning parameter instead of a variable, why couldn't it be useful in other kinds of constrained optimization problems?

>>	Anonymous Wed Jul 31 11:26:01 2019 No.10852226 >>10805292 >take the Juliapill Julia mode in emacs is not comfy at all. :(

>>	Anonymous Wed Jul 31 20:23:02 2019 No.10853576 bump

>>	Anonymous Wed Jul 31 20:37:35 2019 No.10853608 >>10850139 lol

Anonymous Thu Aug 1 01:32:55 2019 No.10854097

>>10851788
regularization is just applying a penalty to any parameter being estimated.

if you approach your analyses in a bayesian context, you realize that regularization is just using a partially/weakly informative prior to shrink parameters towards zero/group mean.

see U(-infty, infty) vs Normal(0, 100) vs Normal(0, 1) vs Laplace(0, 1) vs horseshoe e.g. Normal(0, \sigma), \sigma ~ Cauchy+(0, 1).

Anonymous Thu Aug 1 01:34:15 2019 No.10854099
File: 16 KB, 375x376, stan_logo.png [View same] [iqdb] [saucenao] [google]

>>10854097
like, in general, most problems in the actual modeling part of data science are MUCH easier if you approach them from a bayesian context. most "tricks" in machine learning/data science are just a change in prior choice.

Stan master race.

>>	Anonymous Thu Aug 1 02:05:04 2019 No.10854128 >>10850107 >>10850139 >>10850187 This is the guy in your group who works on weekends and doesn't talk to anybody during lunch. Either that or an undergrad LARPer.

Anonymous Thu Aug 1 06:24:56 2019 No.10854419

>>10852111
>>10852126
>>10854097
You have no idea what you're doing, and you're going to get bad results. Regularization is a prior to enforce regression model simplicity. If you're trying to optimize a non-regression problem, that makes literally no fucking sense to apply regularization. You're not going to converge to the correct optima because you arbitrarily decided that you want to find answers close to zero, and this will also mess with your objective function. Constraints, if you really do have them, should be handled using barrier functions.

>>	Anonymous Thu Aug 1 06:53:57 2019 No.10854454 >>10819150 >implying this is a bad thing just rush the easy work and teach yourself some fancy tricks during your downtime, then shoot for a senior role at a different company

Anonymous Thu Aug 1 07:05:12 2019 No.10854469

>>10834224
yes and no. "data science" is a term that includes a lot of topics, including but not limited to:
>web scraping (scripting in python, JS, etc.)
>data warehousing (databases & api development with SQL and an object-oriented lang)
>machine learning (typically python)
>linear and multivariate regression (applied statistics, usually done in R or SAS)
>data modeling (a set of decisions you have to make before writing code)
>data visualization (a meme term for making charts and powerpoints)
>technical writing (explaining numbers to boomers)
a lot of these require programming, but "data scientist" could mean anything from "machine learning dev with high code quality" to "zoomer who uses microsoft office really well and sometimes writes python scripts."

>>	Anonymous Thu Aug 1 07:11:38 2019 No.10854479 >>10851746 Seconding Georgia tech. University of Illinois is good too, but there's no reason to pay three times the price. That being said, I'd apply to all 4 to see if you qualify for any scholarships.

>>	Anonymous Thu Aug 1 07:40:09 2019 No.10854522 >>10810127 All of these are shit for brainlet engineers. What should a real mathematician (IQ > 190) read?

>>	Anonymous Thu Aug 1 07:41:30 2019 No.10854525 >>10854469 So you’re saying its a meme?

>>	Anonymous Thu Aug 1 08:10:48 2019 No.10854563 >>10854522 How to stop sucking dicks and get a job. Written by me.

>>	Anonymous Thu Aug 1 08:17:10 2019 No.10854569 >>10854563 Shut the fuck up NEET. Everyone knows you still suck cocks

>>	Anonymous Thu Aug 1 10:12:55 2019 No.10854743 >>10854419 Why do you try to respond when you clearly don't understand what you are talking about?

>>	Anonymous Thu Aug 1 10:15:37 2019 No.10854749 >>10810127 I took 2 classes on Casella and Berger and can confirm it is indeed hard as fuck. t. MS Math

>>	Anonymous Thu Aug 1 10:21:36 2019 No.10854759 Should I take an introductory course in machine learning even if I never programmed before a lot?

Anonymous Thu Aug 1 11:01:47 2019 No.10854847

>>10805651
Would you say that those 6 week bootcamps are worth it?

I only have a Chemical Engineering masters currently working at a university mostly doing scientific programming, but I have a lot of good GitHub contributions and some minor software dev freelance work. The oil industry slump is scaring me and I desperately want to get into DS/ML.

>>	Anonymous Thu Aug 1 11:05:34 2019 No.10854855 >>10854847 I recommend going into programming instead, Data Science would be a lot of retraining when it sounds like you're already a competent programmer. Just apply for some positions that say software engineer or something like that.

Anonymous Thu Aug 1 11:15:14 2019 No.10854874

>>10854855
>Just apply for some positions that say software engineer or something like that.
I have, but I've never gotten an interview for any of those applications. The only freelance work I've ever found is through friends/colleagues. I'm not sure how exactly to transition into software dev. There's a lot of work for software engineers working on control systems etc., but they don't seem to look like twice at a ChemE's CV (which is a bit ridiculous since I've taught both ChemEs/EEs process control as an assistant lecturer, but whatever such is life).

The only reason I thought DS/ML is because my research has some applications for ML, in fact I have some limited contributions to fundamental open source libraries in the field (not ML libraries themselves, but the SciPy stack).

I have until the end of the year when my stipend expires, but after that I might actually starve.

>>	Anonymous Thu Aug 1 11:25:52 2019 No.10854897 >>10854874 Like employers care about your EE/ChemE experience for programming. In fact, putting that in your resume/CV makes you sound overqualified and that you would expect higher pay just for having education in an unrelated field.

Anonymous Thu Aug 1 11:28:36 2019 No.10854904

>>10854897
>In fact, putting that in your resume/CV makes you sound overqualified and that you would expect higher pay just for having education in an unrelated field.
I don't really know how to taylor my CV otherwise though.

My only experience developing libraries is on scientific libraries.

>>	Anonymous Thu Aug 1 11:29:59 2019 No.10854908 >>10854904 I think I'm just going to bite the bullet and start applying for CS PhD/Master programmes in ML. Thanks anyway Anons.

>>	Anonymous Thu Aug 1 11:32:14 2019 No.10854910 >>10854904 I don't know what to tell you then, it doesn't sound like you have any relevant experience in programming or data science right now. Just find some bootcamp online, complete it, then pray you get a job.

>>	Anonymous Thu Aug 1 20:45:29 2019 No.10856103 >>10854569 Nope. Senior data scientist. Try again, retard.

>>	Anonymous Fri Aug 2 06:47:40 2019 No.10857021 Can I get a DS/ML job without a degree? I have a couple years in uni but want to quit. I have a "good" mathematical background, python data stack, SQL, some kaggle competitions.

>>	Anonymous Fri Aug 2 06:50:24 2019 No.10857026 >>10854897 Retard.

>>	Anonymous Fri Aug 2 07:05:01 2019 No.10857043 I did a bachelors degree in sociology, a masters degree in survey methodology afterwards while studying stats + machine learning in my free time. Currently working in bioinformatics doing data science stuff. Roast me please

>>	Anonymous Fri Aug 2 07:25:55 2019 No.10857066 >>10857021 I think you probably need at least a bachelors in something. I have heard of many people from multiple different disciplines entering data science but they always at least had a bachelors degree.

Anonymous Fri Aug 2 13:11:51 2019 No.10857831
File: 595 KB, 3456x1988, map_algorithms_spmf_data_mining097.png [View same] [iqdb] [saucenao] [google]

Guys! I need help. I'm trying to decide if I want to do a masters degree or not.

My current credentials
>Bachelors of computer science with good enough grades to do a masters program
>3 years of work experience as a software developer
>Currently just got a very high paying remote job for a unicorn startup, 43 hours a week
>I'm a really fast programmer (previous job I could get all my sprints work done in 3 days if I tried and stopped shitposting)
>Relatively smart and good at math
>Work is essentially building distributed systems to handle very large numbers of requests
>Familiar with python, statistical learning concepts

Goals
>Obtain statistical prowess - be able to apply it on my day to day to my work
>Work at google/facebook as a senior by 27 (currently 24)
>Learn some more math, I love math
>Publish a paper or two

Here's the kicker. If I did a masters degree, I'd do it while working full time. There is no way I'll give up my current income. I know I can do this, because I worked full time when I finished my CS bachelors. Should I do a masters degree, or just fashion one myself?

So it comes down to

>Make my own study programme; and do open source work to show progress for prospective employers
>Do a masters degree and almost die, but come out on the other end a stronger man

plz halp

>>	Anonymous Fri Aug 2 14:56:39 2019 No.10858207 >>10857831 Do you REALLY need a Masters to achieve your goals?

>>	Anonymous Fri Aug 2 15:07:26 2019 No.10858230 /data/, I have only ever taken one course in probability. what do I have to study to go into data science?

>>	Anonymous Fri Aug 2 15:47:25 2019 No.10858312 >>10858230 statistics

>>	Anonymous Fri Aug 2 15:49:06 2019 No.10858314 >>10854749 lol retard

>>	Anonymous Fri Aug 2 16:07:13 2019 No.10858362 >>10858207 no not really, but it would definitely help put me on the right path..

>>	Anonymous Fri Aug 2 22:54:56 2019 No.10859371 Why does L1 regularization induce sparse solutions?

>>	Anonymous Fri Aug 2 22:59:17 2019 No.10859377 >>10859371 Because it penalizes solutions for being not sparse.

Anonymous Sat Aug 3 04:27:12 2019 No.10859911
File: 9 KB, 353x143, (PNG Image, 353 × 143 pixels).png [View same] [iqdb] [saucenao] [google]

>>10859371
because l2's contours are a circle while l1's are a rotated square, with the edges on the axis, so it's more likely to hit coefficients at 0

not sure if i transfered the intuition nicely, check pic

Anonymous Sat Aug 3 05:11:41 2019 No.10859963

>>10859371
It penalizes a lot small errors but is forgiving for larger ones (compared to l2 for example)
So you get a solution that has no small variations but once in a while one HUGE variation.
So it's sparse in the sense that it makes lots of residuals pile up at zero and allows in exchange few residual to take huge values.

Advanced search
Text to find
Subject [?]Search by post subject. Leave empty for any.
Username [?]Search for user name. Leave empty for any user name.
Tripcode [?]Search for tripcode. Leave empty for any.
Email [?]Search by email. Leave empty for any.
Filename [?]Search by image filename. Leave empty for any.
From Date [?]Enter what date to start searching from. Format is YYYY-MM-DD
To Date [?]Enter what date to start searching until. Format is YYYY-MM-DD
Image hash
Search in	All Posts OPs Only
Deleted posts	Show all posts Show only deleted posts Only show non-deleted posts
Internal posts	Show all posts Show only internal posts Show only archived posts
Order	New posts first Old posts first
Capcode	All Posts Only by Users Only by Mods Only by Admins Only by Developers
Results	Posts Threads
Action	[ Simple ]