[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math


View post   

File: 100 KB, 394x329, 1564840632688.png [View same] [iqdb] [saucenao] [google]
11089903 No.11089903 [Reply] [Original]

I need help.

I have some data that is two scalars paired like so:

(x_i1, x_i2)

where i runs from 1 to N, say sample number. A pair is just 2 repeated measurements on sample i.

Now I want to estimate the sample variance / compute standard error and mean for the samples.

Do i treat the data as just 2N measurements, and compute the sample mean and sample variance over all measurements

Or do I compute the mean and variance for each pair, then average these over the pairs?

The goal is to estimate the sample mean and variance for the sampling distribution where you select a random sample and take a single measurement

>> No.11089935

>>11089903
if you think hard enough your second option has no meaning

>> No.11089952

The sample mean will be a vector with the entries
[eqn]
\bar x_1 = \frac{\sum_{i=1}^N x_{i1}}{N} \\
\bar x_2 = \frac{\sum_{i=1}^N x_{i2}}{N}
[/eqn]
and the sample covariance matrix will be
[eqn]
Q = \begin{pmatrix} \frac{1}{N-1}\sum_{i=1}^N (x_{i1} - \bar x_1)^2 & \frac{1}{N-1} \sum_{i=1}^N (x_{i1} - \bar x_1)(x_{i2} - \bar x_2) \\
\frac{1}{N-1} \sum_{i=1}^N (x_{i1} - \bar x_1)(x_{i2} - \bar x_2) & \frac{1}{N-1} \sum_{i=1}^N (x_{i2} - \bar x_2)^2
\end{pmatrix}
[/eqn]
The sample variances are just the diagonal entries of the sample covariance matrix.

>> No.11089962

If you're doing the actual "sample" variance then you will calculate the variance in batches of data at a time, say 10-20% of the data at a time (and then you can average over all samples, bringing you to an approximate population variance). If you just want a single calculation of the sample variance, you ought to still calculate the population variance to calculate the percent error of the sample variance to your approximated population variance; this approximation should be close to the true value per the Central Limit Theorem.

>> No.11089974

>>11089935
Why is that?

>> No.11089978
File: 8 KB, 250x228, 1568165148771.jpg [View same] [iqdb] [saucenao] [google]
11089978

Hmm Should I actually just compute the sample mean of each pair, treat that as a single measurement?

>> No.11090006

>>11089978
no, they're different measurements

WHAT ARE YOU TESTING, what is the purpose of what you're doing?

>> No.11090013

If the pairs are of the same measurement where it is meaningful to add them up...

>> No.11090014

>>11089903
You operate on each dimension independently. You can make up some cross metrics, but that's not what's standard.

>> No.11090015

>>11089903
Is it meaningful to sum up the pair?

>> No.11090027

>>11090006
Why cant you pool measurements like that? Yes, they are different measurements, but they are measuring the same thing on the same sample.

Lets say the samples are rods with different temperatures at each ends. On each rod I measure temperature on each end, and then I take the average and call that the temperature of the rod.
Then I go to the next rod, and so on.

>> No.11090077
File: 348 KB, 829x633, 1569953267516.png [View same] [iqdb] [saucenao] [google]
11090077

bump

>> No.11090102

>>11090027
well you just need to say what the measurements mean.

If they're measurements of the same rod, if they're measurements made by the same device, at the same spot and so on. You didn't describe that at first, and we couldn't guess.

>> No.11090311

>>11090102
so whats the conclusion here?

I mean for a chemist/physicist/whatever you usually take repeated measurements because you want a more "precise" value, i.e. reducing the effects of random/experimental error.