• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Genome sequencing remarkably advanced

lpetrich

Contributor
Joined
Jul 27, 2000
Messages
25,148
Location
Eugene, OR
Gender
Male
Basic Beliefs
Atheist
Genome Sequencing and Covid-19: How Scientists Are Tracking the Virus - The New York Times
Edward Holmes was in Australia on a Saturday morning in early January 2020, talking on the phone with a Chinese scientist named Yong-Zhen Zhang who had just sequenced the genome of a novel pathogen that was infecting people in Wuhan. The two men — old friends — debated the results. “I knew we were looking at a respiratory virus,” recalls Holmes, a virologist and professor at the University of Sydney. He also knew it looked dangerous.

Could he share the genetic code publicly? Holmes asked. Zhang was in China, on an airplane waiting for takeoff. He wanted to think it over for a minute. So Holmes waited. He heard a flight attendant urging Zhang to turn off his phone.

“OK,” Zhang said at last. Almost immediately, Holmes posted the sequence on a website called Virological.org; then he linked to it on Twitter. Holmes knew that researchers around the world would instantly start unwinding the pathogen’s code to try to find ways to defeat it.
That made it possible to create vaccines for the virus without having a sample of it on hand.

It was also possible to create primers for PCR tests to detect the virus without any samples of it -- the test will amplify whatever binds to the primer.

By sequencing genomes from samples of the virus, it is possible to track its spread by looking for mutations in its genome. That's genomic epidemiology: "gen epi". Mechanisms of viral mutation - in substitutions per nucleotide per cell infection, RNA viruses have 10^(-4) - 10^(-6) and DNA viruses have 10^(-6) - 10^(-8).

The COVID-19 virus is 30 kilobases long, and that means 0.03 to 3 mutations per infection.
 
Much of this work is very recent, because of the enormous increase in sequencing capacity over the last few decades.

DNA Sequencing Costs: Data at National Human Genome Research Institute Home | NHGRI

The cost per raw megabase of sequence declined from $10,000 to $1000 over 2001 - 2007, then dropped superfast, to $10 in 2008 and $0.3 in 2010 and $0.1 in 2012. In 2015, it dropped to $0.02 and in 2019, to $0.01.

The estimated cost of a human-genome sequence went down from $100 million in 2001 to $100 in 2019.

Back to the NYT.
Already, in Church’s estimation, “sequencing is 10 million times cheaper and 100,000 times higher quality than it was just a few years ago.” If a new technological paradigm is arriving, bringing with it a future in which we constantly monitor the genetics of our bodies and everything around us, these sequencers — easy, quick, ubiquitous — are the machines taking us into that realm.

And unexpectedly, Covid-19 has proved to be the catalyst. “What the pandemic has done is accelerate the adoption of genomics into infectious disease by several years,” says deSouza, the Illumina chief executive. He also told me he believes that the pandemic has accelerated the adoption of genomics into society more broadly — suggesting that quietly, in the midst of chaos and a global catastrophe, the age of cheap, rapid sequencing has arrived.
The author then described visiting a gene-sequencing lab, then got into how much that gene sequencing has improved.
As machines improved, the impact was felt mainly in university labs, which had relied on a process called Sanger sequencing, developed in the mid-1970s by the Nobel laureate Frederick Sanger. This laborious technique, which involved running DNA samples through baths of electrically charged gels, was what the scientists at Oxford had depended upon in the mid-1990s ... “In an H.I.V. genome, when we first started doing it, we would be able to look at a couple hundred letters at a time.” But O’Connor says his work changed with the advent of new sequencing machines. By around 2010, he and Friedrich could decode 500,000 letters in a day. A few years later, it was five million.

By 2015, the pace of improvement was breathtaking. “When I was a postdoctoral fellow, I actually worked in Fred Sanger’s lab,” Tom Maniatis, the head of the New York Genome Center, told me. “I had to sequence a piece of DNA that was about 35 base pairs, and it took me a year to do that. And now, you can do a genome, with three billion base pairs, overnight.”
Illumina got down to $1000 for a human genome in 2014, and one of its most recent sequencing machines can do a human genome for $600. From the looks of it, $100 / (human genome) may be done in the next few years.
These numbers don’t fully explain what faster speeds and affordability might portend. But in health care, the prospect of a cheap whole-genome test, perhaps from birth, suggests a significant step closer to the realization of personalized medicines and lifestyle plans, tailored to our genetic strengths and vulnerabilities.
In effect, a complete genetic profile.

"In some respects, it has begun already, even amid a public-health crisis."
 
From the start, the gen-epi community understood that the SARS-CoV-2 virus would form new variants every few weeks as it reproduced and spread; it soon became clear that it could develop one or more alterations (or mutations) at a time in the genome’s 30,000 base letters. Because of this insight, on Jan. 19, 2020, just over a week after the virus code was released to the world, scientists could look at 12 complete virus genomes shared from China and conclude that the fact that they were nearly identical meant that those 12 people had been infected around the same time and were almost certainly infecting one another.
From two cases in Washington State with nearly the same genome,
On Feb. 29, Bedford put up a Twitter post that noted, chillingly, “I believe we’re facing an already substantial outbreak in Washington State that was not detected until now.” His proof was in the code.

... In other words, sequencing had advanced from a few years ago, when scientists might publish papers a year after an outbreak, to the point that genetic epidemiologists could compare mutations in a specific location in order to be able to raise alarms — We have community spread! Patients on Floor 3 are transmitting to Floor 5! — and act immediately.

To watch the pandemic unfold from the perspective of those working in the field of genomics was to see both the astounding power of new sequencing tools and the catastrophic failure of the American public-health system to take full advantage of them.
This seems in character with the Trump Administration's very poor response.
One of the Biden administration’s approaches to slowing the pandemic has been to invest $200 million in sequencing virus samples from those who test positive.
Which is a great contrast.

Genetic sequencing: U.S. lags behind in key tool against coronavirus mutations - The Washington Post - "Researchers warn the U.S. desperately needs to sequence more genomes so it can stay ahead of new variants"
That article was datelined January 29 of this year, a little over a week after the end of the Trump Admin, so what it describes is an outcome of the Trump Admin's handling of the virus.
 
Sequencing goes back over half a century, though it was proteins that were first sequenced.

Frederick Sanger Sequences the Amino Acids of Insulin, the First of any Protein. : History of Information -- bovine insulin chains A and B.

By the 1960's, several other proteins were sequenced, including versions of several proteins from different species. One could then compare those versions to try to find the family tree of their owners -- and family trees of the proteins themselves. For instance, jawed-vertebrate hemoglobin is a combination of four subunits, two with the "alpha" or a sequence and two with the "beta" or b sequence, arranged as alternating vertices of a square:
a b
b a

When one compares the alpha chains from different species, one finds the family tree that one finds from earlier work in evolutionary biology, and likewise for beta chains. Comparing the two indicates that the two are related -- and that they originated from a gene duplication in a very early jawed fish.

The Most Influential Scientists in the Development of Medical informatics (13): Margaret Belle Dayhoff -  Margaret Oakley Dayhoff - she was an early pioneer of computer techniques in molecular biology, like molecular evolutionary biology.
n 1966, Dayhoff pioneered the use of computers in comparing protein sequences and reconstructing their evolutionary histories from sequence alignments. To perform this work, she created the single-letter amino acid code to minimize the data file size for each sequence. This work, co-authored with Richard Eck, was the first application of computers to infer phylogenies from molecular sequences. It was the first reconstruction of a phylogeny (evolutionary tree) by computers from molecular sequences using a maximum parsimony method.
That is, fitting by making the necessary number of changes as small as possible.
Based on this work, Dayhoff and her coworkers developed a set of substitution matrices called the PAM (Percent Accepted Mutation), MDM (Mutation Data Matrix), or Dayhoff Matrix. They are derived from global alignments of closely related protein sequences. The identification number included with the matrix (ex. PAM40, PAM100) refers to the evolutionary distance; greater numbers correspond to greater distances. Matrices using greater evolutionary distances are extrapolated from those used for lesser ones.[9] To produce a Dayhoff matrix, pairs of aligned amino acids in verified alignments are used to build a count matrix, which is then used to estimate at mutation matrix at 1 PAM (considered an evolutionary unit). From this mutation matrix, a Dayhoff scoring matrix may be constructed. Along with a model of indel events, alignments generated by these methods can be used in an iterative process to construct new count matrices until convergence.[10]
indel = insertion and deletion
PAM linked to  Point accepted mutation
One could find out which amino acids tend to mutate into which other ones, and from those rates, test hypotheses about how interchangeable they are.

She was also the author of "Atlas of Protein Sequence and Structure", a book where she listed the sequences of all the 65 proteins sequenced at the time of its publication, 1965. "The Atlas was organized by gene families, and she is regarded as a pioneer in their recognition."

"In the early 1960s, a theory was developed that small differences between homologous protein sequences (sequences with a high likelihood of common ancestry) could indicate the process and rate of evolutionary change on the molecular level. The notion that such molecular analysis could help scientists decode evolutionary patterns in organisms was formalized in the published papers of Emile Zuckerkandl and Linus Pauling in 1962 and 1965."

For example, the family tree of hemoglobin that I mentioned earlier.
 
 DNA sequencing came later, more properly nucleic-acid sequencing, to include RNA sequencing.The sequence of sequencers: The history of sequencing DNA - A journey through the history of DNA sequencing - Timeline: History of genomics | Facts | yourgenome.org


The first bit of nucleic acid ever sequenced was a transfer RNA by Robert Holley in 1965. He got a Nobel Prize for that in 1968.

The first gene sequenced was that of the coat protein of bacteriophage MS2 in 1972. That virus's complete genome was sequenced in 1976. Its genome is linear positive-sense single-stranded RNA with 3,569 nucleotides, one of the smallest known.

The first DNA genome sequenced was of bacteriophage Phi X 174 in 1977. Its genome is circular positive-sense single-stranded DNA with 5,386 nucleotides.

Bacteriophage = virus that infects bacteria

The Epstein-Barr virus, with 172,282 nucleotides in its genome, was done in 1984.

The first genome sequence of a cellular organism was of the bacterium Haemophilus influenzae ("blood lover of influenza") in 1995. Its genome has size 1,830,137 base pairs.

The first genome sequence of a eukaryote was of the common yeast Saccharomyces cerevisiae ("sugar fungus of beer"). This is the yeast that's often used in making bread, beer, and wine, and it is also often used as a model system. It was done in 1996, and has around 12,156,677 base pairs and 6,275 genes, all on 16 chromosomes.

The first genome sequence of a multicellular organism was of the soil nematode Caenorhabditis elegans ("elegant recent rodlike (entity)"), a tiny worm that grows to 1 mm long, and is also a model system. It was done in 1998, though with some small gaps that were filled in in 2002. It has around 100 million base pairs with some 20,470 protein-coding genes in six chromosomes.

That was followed by the genome of the fruit fly Drosophila melanogaster ("black-bellied dew lover"), a common model system. Done in 2000, its genome has 139.5 million base pairs with some 15,682 genes in four chromsomes.
 
Damn you!

Took me three tries to figure out it wasn't talking about gnome sequencing! :)
 
The Human Genome Project was started in 1990, a rough draft was released in 2000, an improved one in 2001, and completion of most of the genome by 2003. There were still some gaps in the sequencing coverage, and they were gradually filled in, with the last ones being filled this year.

Over the last few decades, a large number of species' genomes have been sequenced, and there is very likely more to come.
There is more of them at genome-database sites like
 
Damn you!

Took me three tries to figure out it wasn't talking about gnome sequencing! :)

Oddly the metric system is different from the standard system.
And since snipers are outlawed, I'll give you that...
 
Back
Top Bottom