All that aside, the origin of the first organism remains an unsolved problem.
One can go bottom-up, with prebiotic chemistry, or top-down, with evolutionary biology, and the two efforts still have not met.
In prebiotic chemistry, we have made some kinds of building blocks, like the smaller amino acids and some nucleobases, but it's hard to get an organism out of them, and some building blocks continue to be difficult to make prebiotically, like ribose.
In evolution, we have made remarkable progress. Before we could sequence anything, we were able to find reasonably-good family trees of many macroscopic organisms, though that effort was not nearly as successful for microscopic ones, especially one-celled ones.
Frederick Sanger - in 1951 and 1952, he sequenced each amino-acid chain of bovine insulin, the first protein sequence ever found. It was painstaking work, doing
Edman degradation by hand, a technique invented by
Pehr Victor Edman But in 1967, he succeeded in automating this process -
A revolution in protein sequencing: Case Study | NHMRC - in the early 1990's, mass spectrometry became used for protein sequencing -
Methods and Techniques for Protein Sequencing - Creative Proteomics
By 1976, over 80,000 amino acids had been sequenced in various laboratories, and by 2017, 70 million.
A History of Sequencing of nucleic acids, DNA and RNA
In 1964,
Robert W. Holley sequenced alanine transfer RNA, in 1972
Walter Fiers sequenced a gene, and in 1976, the entire genome (3,569 base pairs) from bacteriophage MS2, a kind of virus that infects bacteria.
In 1977, Frederick Sanger struck again, developing the "dideoxy" method of terminating DNA chains:
Sanger sequencing - in 1981, human mitochondrial DNA (16,569 base pairs), and in 1982, bacteriophage lambda (48,502 base pairs).
Since then, a lot has been done to automate and speed up gene sequencing, with new techniques like "pyrosequencing", making it cost much less per nucleobase or base pair.
From that history of sequencing,
The Human Genome Project was the international research effort to determine the DNA sequence of the entire human genome. It took 13 years and was published in 2003, with an estimated cost of over $300 million. Today, a whole human genome can be sequenced in one day for under $1000.
The 100,000 Genomes Project was first announced by UK Prime Minister David Cameron in December 2012, resulting in the creation of Genomics England. In December 2018, the full 100,000 genomes milestone was reached, taking over half the time that sequencing just one genome took in 2003.
Another history:
The sequence of sequencers: The history of sequencing DNA - PMC
List of sequenced eukaryotic genomes
- 1995: bacterium Haemophilus influenzae - 1,830,138 bp
- 1996: Brewer's yeast Saccharomyces cerevisiae -12,156,677 bp
- 1998: Nematode Caenorhabditis elegans - 101.169 million bp
- 2000: Fruit fly Drosophila melanogaster - 139.5 million bp
- 2001 (draft), 2006 (complete) Human Homo sapiens - 3.2 billion bp
So we now have a huge pile of sequence data. What have we been able to do with it?