[SCI] Genomics & Computational Biology
Genomics is the large-scale study of entire genomes — their sequencing, structure, function, and evolution — made possible by the convergence of molecular biology, chemistry, and computational methods.
Overview
Watson and Crick's DNA double helix (1953) revealed the information storage mechanism. Sanger's chain-termination sequencing (1977) enabled reading DNA sequences. The Human Genome Project (1990–2003) sequenced the complete human genome for USD 3 billion. Illumina's short-read sequencing (2007) reduced the cost to ~USD 1,000 per genome by 2013. CRISPR-Cas9 (Doudna & Charpentier, 2012) enables precise genome editing. Bioinformatics — the application of information theory, statistics, and ML to genomic data — is now a major discipline.
Key Figures & Recognition
- Watson, Crick, Franklin, Wilkins: DNA structure. Nobel Prize 1962 (Watson, Crick, Wilkins; Franklin died 1958).
- Frederick Sanger (1918–2013): DNA sequencing. Nobel Prize 1980 (his second Nobel).
- Jennifer Doudna (1964–) & Emmanuelle Charpentier (1968–): CRISPR-Cas9. Nobel Prize 2020.
Seminal Papers
- Watson, J. & Crick, F. "A Structure for Deoxyribose Nucleic Acid." Nature 171 (1953).
- Sanger, F. et al. "DNA sequencing with chain-terminating inhibitors." PNAS 74 (1977)
What This Enables
- [TECH] AI & Large Language Models — Protein language models (ESMFold, AlphaFold) are transformer LLMs trained on protein sequence databases.
Discovery Character
Surprise level: High — The Human Genome Project's completion (2003) revealed far fewer genes than expected (~20,000 vs. 100,000 predicted) and vast non-coding regions of unclear function. AlphaFold's solution of the 50-year protein-folding problem (2020) was a genuine shock to the structural biology community.
Mode: Systematic with competitive urgency and ethical complexity. Watson and Crick raced against Pauling; the double helix discovery used Franklin's X-ray data (Photo 51) without her knowledge or consent — a celebrated but ethically troubled origin. Modern genomics is Edisonian in data generation (sequence everything, analyse later) but increasingly systematic in interpretation via ML.