[SCI] Deep Learning

Created 2 months ago, updated about 1 month ago

Deep Learning is the application of deep (many-layer) neural networks to perception and prediction tasks, achieving human or superhuman performance on image recognition, speech, language, and scientific modelling.

Overview

The ImageNet breakthrough (Krizhevsky, Sutskever, Hinton, 2012) demonstrated that deep convolutional networks trained on GPUs massively outperformed all prior approaches for image classification. This launched the modern AI era. Transformers (Vaswani et al., 2017) replaced recurrent networks for sequence modelling, enabling GPT (2018), BERT (2018), and ultimately GPT-4 (2023) and Claude. AlphaFold (DeepMind, 2020) solved the 50-year protein structure prediction problem. Deep learning is now applied to drug discovery, materials design, climate modelling, and physics.

Key Figures & Recognition

Geoffrey Hinton (1947–), Yann LeCun (1960–), Yoshua Bengio (1964–): Turing Award 2018. Hinton: Nobel Prize in Physics 2024.
Ilya Sutskever (1985–): GPT, co-founded OpenAI.
Demis Hassabis (1976–): AlphaFold, AlphaGo. Nobel Prize in Chemistry 2024.

Seminal Papers

What This Enables

[TECH] AI & Large Language Models — LLMs (GPT, Claude, Gemini) are large-scale deep learning systems; the transformer is a deep learning architecture.
[TECH] Medical Imaging (X-ray, CT, PET) — Convolutional networks for radiology, pathology, and ophthalmoscopy are among the first clinically deployed DL applications.
[SCI] Genomics & Computational Biology — AlphaFold (2021) and protein language models applied transformer architecture to solve protein structure prediction.

Discovery Character

Surprise level: Extreme — GPT-3's emergent abilities (2020) surprised OpenAI's own researchers: the model learned to perform tasks it was never explicitly trained on (few-shot learning, arithmetic, code generation). AlphaFold's (2020) solution to the 50-year protein-folding problem arrived faster than most structural biologists thought possible. The capabilities of large models at each scale threshold consistently exceeded prior predictions.

Mode: Systematic with emergent surprises. Training deep networks is systematic: scale compute, data, and parameters according to known scaling laws (Kaplan et al., 2020). But the capabilities that emerge at each scale threshold — in-context learning, chain-of-thought, code synthesis — were not predicted from the scaling laws and surprised researchers. The ImageNet breakthrough (Krizhevsky, Sutskever, Hinton, 2012) itself was a serendipitous scale-of-effect: they entered a competition expecting modest improvement and won by a margin that shocked the computer vision community.

Dashboard