Exposing the Machinery of the Resistome 23
aarondubrow writes "2011 Nobel Prize Winner, Bruce Beutler, is using the Ranger supercomputer at The University of Texas at Austin for an ambitious new project to discover all of the genes involved in the mammalian immune response – the so-called 'resistome.' Over several years, Beutler's lab will sequence the protein coding portions of genes in 8,000 mice to detect the impact of mutations on immunity. This means scanning, enriching and sequencing 500 billion base pairs every week. The project represents a 'Big Data' problem of the highest order."
Re: (Score:1)
... So, by process of elamination, ...
Was trying to think of a funny joke surrounding elamination, but the web search results were interesting (when you are bored at work) in their own right.
The only search result I got on elamination was this translation dictionary (http://en.bab.la/dictionary/english-spanish/elaminate):
elaminate [bot.] (also: without lamina)
Lamina, in turn, comes back as type of spider (amongst other vague references and dead links):
http://en.wikipedia.org/wiki/List_of_Desidae_species
I've asked the mice what they thou
Acquired data vs. archived data set (Score:3)
The project represents a 'Big Data' problem of the highest order.
Before or after de-duplication of the data? Before, yes obviously but if that is still the case after de-duplication then gaining much knowledge from this experiment may prove to be a fools errand.
Re: (Score:1)
I'm not sure the data is even that big. Given that a single base can be encoded in 2 bits, 500 billion base pairs could be stored in about 0.9 TB. The 8000 mice will have a lot of DNA in common with each other, so compression should be able to reduce storage requirements by several orders of magnitude..
Perhaps I am overlooking something, but this hardly seems like a "'Big Data' problem of the highest order". Could anyone with experience in the field of DNA sequencing please confirm or explain why I'm wrong?
General problem (Score:1)
Ok, quick question -- how do they determine what set of chemical markers in DNA constitute a "gene"? It seems like that could only be known by outcomes research by "running" the DNA to see what each little chunk produced.
Re: (Score:3, Informative)
The genetic code has comments. Actually, it has something like a boot record for each gene. The gene part is called an Open Reading Frame (ORF) and it is marked by stop codons. The gene is the part of the DNA that is to be transcribed by RNA and then sent to the ribosomes, which are 3D protein printers. There are little switches that turn this process on and off for the different genes. Some of the genes, such as for metabolism, run all the time, others are for special occasions.
Re: (Score:1)
The mouse genome is pretty well annotated in terms of where genes are. One can use gene prediction algorithms combined with sequencing of RNA to identify most or all of the genes in a species, and this has been done over the years for mice. In this case, they're doing whole-exome sequencing, where you enrich for known regions of the genome. That means they have a LOT less data to deal with (i.e. less aligning and SNP calling per mouse), which makes the experiment more tractable. On the down-side, that also
Re: (Score:2)
As others have pointed out it is possible to deduce where genes are by looking at the sequence however this is by no means straightforward DNA is spaghetti code of the very worst kind.
It is possible to "run the DNA to see what it produces", basically when a (DNA) gene is active copy's of its sequence are made in messenger RNA (mRNA, its like DNA but much less stable) the mRNA copy's are perhaps akin to compiled code as there is a fair amount of rearrangement that goes on before its '3D printed' in protei
The what? (Score:2)
Ome My God (Score:3)
This was getting silly a few years ago with the metabolome. How many more omes (i.e. subsets of the total system that influences human biology) do we need to look at until we declare our human model complete? Is there going to be a 'humanome' that describes human-associated environmental factors? What about a 'radiatome' that describes the plethora of electromagnetic signals that enter our body over the course of a lifetime?
Re:Ome My God (Score:4, Funny)
I suggest a... thunderdome!
Two -omes enter! One -ome leaves!
Don't fuck with the immune system (Score:2)
Is it just me, or is anybody else worried that the more we try to mimic the human body's immune system the more problems we are creating for ourselves vis a vis antibiotic resistance? We are so bad with how we use antibiotics that we are inevitably going to create bugs that are resistant to most forms of antibiotics. I surely don't want to create a class of bugs that are resistant to the VERY WAY THAT OUR IMMUNE SYSTEM WORKS.