[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math


View post   

File: 1.21 MB, 1131x935, chimp-out.jpg [View same] [iqdb] [saucenao] [google]
15805283 No.15805283 [Reply] [Original]

Why can't we compare entire human chromosome 2 to entire ape (chimp) chromosome 2?
They are about million characters long.
There is 4 possible characters: A,T,C or G.

A computer will claim out of memory if both are being put in a gene alignment program that tries to found similarities between them.

They should be about 95% identical but nobody has been able to figure it out completely.

A program should be able to show the start of the chromosome, lets ay it is "AACAGTA" for human and "AAGAGTA" for chimp

Now this is done for ENTIRE million character long sequence until it seems satisfying.

Alignment causes it to become like this:

"---AAGAG" for human and "CCAAGAG" for chimp or whatever

What I am trying to say is there will be gaps and because of gaps (no sufficiently matching pair in a location of 100 bytes or whatever, between two chromosomes the resulting data of comparison is not million characters but could be ten million characters with "-" as a new symbol to use (it means empty space)

When all is said and done we should see that there is about for 95-99% of the chromosomal length marked as asterisks (*) which means "no difference at this point in space" and occasionally we see two different characters in a certain location so a difference has been found

Yea verily, there is no program that can do this, current programs will run out of memory after trying to accomplish it for about a week.

Are you now beginning to understand why we cant just input billions of bytes of gene data on a computer and let it run to see what it comes up with?

>> No.15805284

What? English please

>> No.15805294

>>15805284
I can explain it German but not in English

>> No.15805296

>>15805294
Then why even post this incomprehensible babble? You can't even formulate a point correctly

>> No.15805381

>>15805296
Because those fluent in German and somewhat fluent in English can understand what I said because they think like Germans

>> No.15805385

>>15805283
We can do it but the results will be a lower bound of similarities. Finding the absolute percentage of similarities is probably impossible with current processing technology.
But for finding a lower bound, It will take a very long time and require a lot of processing power. But it's doable.

>> No.15805394

>>15805385
>But for finding a lower bound, It will take a very long time and require a lot of processing power. But it's doable.

What set of software packages would you suggest are best for such large datasets?

>> No.15805482

Du hast im Post vorher gemeint das zu Deutsch kannst. Ich würde dir R empfehlen, wenn du ein wenig Programmieren kannst. R hat ziemlich gute Bioinformatik Libraries in Bioconductor. Was hast du für Hardware. Ich meine, dass man in einer library auch nur auf der Festplatte alignmen kann. Werde noch Mal postieren wenn ich sie wiedergefunden habe.

>> No.15805485

>>15805482
https://github.com/hsinnan75/GSAlign
Habs noch nicht benutzt, aber scheint das zu sein, was du gesucht hast.

>> No.15805579

>>15805283
why not use one as a reference then chop the other up and use a fast aligner like BWA?

>> No.15805608

>>15805579
this method could produce less accurate result

the whole thing should be compared to another whole thing as a whole, not in pieces

>> No.15805624

we can run a dot plot to find out anon

>> No.15805670
File: 614 KB, 1905x1909, 1684833192648101.png [View same] [iqdb] [saucenao] [google]
15805670

>>15805624
https://dgenies.toulouse.inra.fr
Done!
you can notice some of the gene inversions and can play around with the settings if you want to look at the alignment of specific chromosomes (the UCSC databases just had easy access to fasta genomes and plots well enough).
if you dont know how a dot plot alignment works, at a basic level, a perfect alignment is a 45 degree line, any sorts of transposon or repetitive activity is very easy to see if you zoom into a specific chromosome.

>> No.15805676
File: 730 KB, 5000x5000, 1695738546283269.png [View same] [iqdb] [saucenao] [google]
15805676

>>15805670
oh, I can just link the alignment to save some time
https://dgenies.toulouse.inra.fr/result/TestManVsChimp
you should select specific genes and see where it shifts/aligns, it's pretty interesting

>> No.15805695

>>15805283
When you searchbar "how similar are animals to humans dna" it provides an a cute video, ascending list of similar material, from fungus, to mammals, to primates, to pigs, to chimpanzees.

What is that even measuring?

>> No.15805704
File: 211 KB, 1080x765, Screenshot_20231015_141710_Chrome.jpg [View same] [iqdb] [saucenao] [google]
15805704

>>15805695
Maybe it's something like this. I've always underestimated the absurdity of the world

>> No.15805795

>>15805283
Because that tool can be uses to figure what cancer actually is

>> No.15805800
File: 61 KB, 512x512, scipepe.jpg [View same] [iqdb] [saucenao] [google]
15805800

>>15805284
Shoo shoo! go away! This is no place for retards like you stupid moron.

>> No.15805812
File: 11 KB, 225x225, ayy pepe.jpg [View same] [iqdb] [saucenao] [google]
15805812

>>15805283
LMAO, going blind is a waste of time. Start with the simplest living organisms to understand syntax and semantics. Once there is far easier and faster to run code diff tools.