[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math


View post   

File: 110 KB, 861x466, Xvqjh.jpg [View same] [iqdb] [saucenao] [google]
10004425 No.10004425 [Reply] [Original]

I took a statistics course many years ago in college but I've forgotten most of it as it hasn't been relevant to my field. However I now have a simple problem:

I have daily data for many sellers using a trading system. It counts the number of trades each seller makes and the numbers of trades a seller makes that violates a certain rule. What I am interested in is a simple statistical test that tells me whether each seller's proportion of trades that violate the rule is statistically significant.

I know the proportion of violations in the entire population of trades since I have all the data. I know I could do a one sample test of proportion but I would like the test to work even for a seller with a low amount of trades (or at least tell me that I can ignore sellers with a certain low amount of trades). I also don't understand how to tell if my data follows the assumptions necessary for the test.

Any ideas from someone who knows basic stats?

>> No.10004524

Bump.

>> No.10004843

Bump

>> No.10004970

>>10004425
If you assume that the number of trades in violation of a rule is bunomially distributed, just make a confidence interval

>> No.10005006

>>10004425
Your question is unclear. Until you clarify what you are actually looking for, you won't get get an answer. Specifically, this sentence
>What I am interested in is a simple statistical test that tells me whether each seller's proportion of trades that violate the rule is statistically significant.
what do you mean statistically significant? Do you mean in relation with trades as a whole, and if so, related by what? or do you mean in relation to their own trades, as in if the number of violating trades they are making could be reasonably attributed to chance as opposed to malice. The way your question is currently phrased it could be answered in many different ways, some not even using statistics. I mean, why not just do a simple proportion rule?

>> No.10005033

>>10004970
Can't I tell what the distribution is, since I have all the data?

>> No.10005036

>>10004425
Hire a statistician?

>> No.10005059

>>10005006
>what do you mean statistically significant?
I thought this can be determined by a significance test. For example the one sample test gives me a confidence interval for a seller's violation rate. If this interval does not contain the total population's violation rate then I can say that the seller is significantly different from the rest of the population. The end goal is to identify bad sellers, but I don't want to simply do this based on their violation rate since a high violation rate could simply be random due to a low number of trades.

>> No.10005061

>>10005036
I don't see why such a simple problem would require hiring someone. I have a basic solution, I just want to know whether this solution is correct in cases of small sample size and whether the distribution of violations is important.

>> No.10005097

>>10005033
Just hire a statistician man, you are just going to do something stupid

>> No.10005105

>>10005097
Why? This should be an easy problem.

>> No.10005114

>>10005059
>I thought this can be determined by a significance test
This is tautological. Again, you have to first define what the significance you're looking for is. I'll think I have some idea given the rest of your post, but I have to assume some things are phrased wrong. Your sentences about intervals don't really make sense to me, so I will attribute it to you just not understanding what you're talking about, correct me if i'm wrong. I'm going to base everything entirely on
thinking you're trying to identify bad sellers based on the violation distribution of the entire trading community.
If this is what you're trying to do, then yes it's pretty basic statistics (significance testing). The only issue is you'll have to make an arbitrary call on what significance value you want to test against. Do you want to be pretty lenient and say that if there's a 20% chance they made those trades randomly you'll ignore it or do you want a 1% chance. The math part is easy, ignore the other posters. The hardest part has to come from someone who understands your resources and the implications of choices i.e. a more lenient significance level means you have to investigate less people but more bad actors get away. The trade offs are your problem to solve.

>> No.10005123

>>10005114
Yes, I would expect that I have phrased things incorrectly since I've forgotten all this stuff. I guess I would just go with the 5% meme. It doesn't really matter though. This isn't going to be set in stone, I just need something to help sift through the data a little better.

>> No.10005127

>>10005114
I know I want a significance test, what I need help with is choosing the right one. I'm a bit confused on how I determine what the distribution of violations is since the stuff I've been reading makes certain assumptions about the distribution of the data and about sample sizes being over a certain number or sample sizes being uniform.

>> No.10005161

>>10005127
>confused on how I determine what the distribution of violations is
what do you mean? you control the data for all trading right? You don't need to determine anything, you literally have it.
>the stuff I've been reading makes certain assumptions about the distribution of the data
I would have to read exactly what you read to answer to it, but i'm almost certain you're mistakenly believing the data has to be normally distributed. You have all trades (and I assume a fuck ton of them), it doesn't matter what the distribution is. Errors will still be normally distributed.
>about sample sizes being over a certain number or sample sizes being uniform
Again i'd have to see what you read, but i'm pretty confident you're either reading something that has nothing to do with what you'll actually use or you're misinterpreting a requirement. The fact that some traders have few trades won't affect anything. If they only have one trade and it's bad then the significance will just be equal to the rate in the general population.
>what I need help with is choosing the right one
binomial test

>> No.10005240

>>10005161
Doesn't the binomial test assume a binomial distribution though? Do I just need the population's proportion of trades that are violations or do I need to actually look at the distribution of the data?

>> No.10005321

>>10005240
>Doesn't the binomial test assume a binomial distribution though?
No. The strict mathematical proof does, but like most things in statistics, you get a close enough approximation given a large enough population. Any statistical test you do will almost certainly fail to strictly meet the mathematical requirements. For something as simple as this, you don't even need to begin worrying about anything like that.
>Do I just need the population's proportion of trades that are violations or do I need to actually look at the distribution of the data?
You could just take the proportion as the assumed "true" probability of someone making a bad trade randomly (of course this is an approximation, but it sounds like you don't require much rigor.) and use that. If you did end up needing a better detection system (although I somewhat doubt it given what you've explained) then it would depend on what you're using to classify bad trades. For example, (and i'm not a finance guy so bear with me) if you were classifying bad trades by selling under market value then obviously two guys who made 3 bad trades each but one went 50% under for each and the under just a few pennies, then a simple pass/fail variable would miss something there. But again, it depends on your classification method and the accuracy you need.

>> No.10005335

>>10005321
OK, thanks.