[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math


View post   

File: 29 KB, 390x560, YoungKolmogorov.jpg [View same] [iqdb] [saucenao] [google]
8634179 No.8634179 [Reply] [Original]

If I have a process Xt ~ sin(2*pi*u*t) where u is uniform(0,1)

For X1 and X2 can I use the integral of sin(2*pi*u1*1)*sin(2*pi*u2*2)du1du2
to calculate the joint cdf of X1 and X2?

Or is does it matter that Cov(X1,X2) is not 0?

>> No.8634190

This is likely beyond the power level of this board anon.

>> No.8634224

Where did you pull that stochastic process from, it's fluctuating as fuck (no closeness conditions on the increments from t to t+du)

What do you really compute when you integrate over the value of the function of u, not the probability density function of u? Or Is that really a cdf?

>> No.8634227
File: 238 KB, 500x557, IQ_test_aussie_edition.png [View same] [iqdb] [saucenao] [google]
8634227

>>8634179
Why not do your own homework?

>> No.8634247

>>8634224
t = {1,2,3,...}
Marginal density of u1 and u2 is

integral from 1 to 0 of 1*sin(pi*u1*t)du

>> No.8634252

>>8634227
B/c I don't the answer?

>> No.8634266

>>8634247
>Marginal density of u1
But the pdf of u is
f(u)=1
and other than that there is not pdf involved, why a the marginal density computed by evaluating some function of u (the sin expression).

Also, as far as I can see, the value at t=5 isn't correlated with the one at t=4 any more than it's correlated with the one at t=2, right? For each t, you sample anew.

>> No.8634278

>>8634266
I think I calculated the Cov function wrong, as I got a non-zero value.

I meant X1 and X2 not u1 u2 in >>8634247
apologizes. I think the mistake is in the Cov != 0 though as that makes sense. Thanks mate.

>> No.8634302

>>8634278
Although if g is some function (like the sin), then you can compute the pdf for g(X) form the one for X.
https://en.wikipedia.org/wiki/Probability_density_function#Dependent_variables_and_change_of_variables

Probably you don't need that for this example, though.

>> No.8634717

>>8634179
Is this a statistics general?

hi /sci/ I am a statistics/machine learning pleb and this is my first time so pls be nice.

I have ~300 patients and gene expression values for ~19000 genes and a groundtruth file that has the "time to progression (of cancer)", "time observed" for each patient, with about 20% of the "time to progression" missing.

I've gathered that the basic approach should be along the lines of: feature select, make a generalized linear model, make a cox regression

So I took all my data, did LASSO (from what I understand, some bastardized step-wise regression). then I split my data into a training set and test set (80% train, 20% test, with the labels being a boolean vector indicating if the cancer progressed or if it's censored) and I applied the weights I got from LASSO to "feature select" on my X_train and X_test data.

Then I did a Cox Regression on the transformed training set and the associated "progression times" , and generated lifetime estimates on my test set.

First thing I noticed: There seem to be two values that the lifetime estimates take, not a range of values. So I have a few questions:

1) How can I incorporate the time observed into the survival analysis? If a patient lasted 1200 days without progressing, (average is like 600 days), how do I make use of that information? Right now it treats a censored value for someone who was observed for 10 days the same as the 1200 one. I have some sloppy ideas like artificially coming up with a progression time, but I was hoping there was a more rigorous way

2) I know if I fit the Cox regression to all of my data, I feel like I would be introducing some bias because I used the data to feature select. I know about the concept of nested cross-validation and all — but what I want to do is fit it to all the data, (anticipating another binary split of the data), and explore both of those to try and see if I can find distinguishing features within the subsets to increase specificity

>> No.8634722

>>8634717
sorry for the long post, I can clarify some unclear things if need be

if anyone has suggestions as to approaches I ought to take, that would also be appreciated.

oh, and in addition to gene expression data, I have data of the same dimensions with the relative difference of the gene expression b/t tumor and normal tissue for each gene for each patient (some NaNs) and another one for mutations, filled with 1s and 0s indicating mutations or not. I chose to proceed with just the expression data because the literature seems to get far using just microarray data (a picture/heat-mappish version of my raw numbers), and I am still searching the literature to see how those numbers might be appropriately incorporated

any other advice or help would be appreciated too

thanks guys

>> No.8634767

>>8634717

/sci/ is a lot slower than what you might be expecting.

You probably should make your own thread if you want to get a response.

>> No.8634776

>>8634767
will do, thanks anon!