My father Paul was a physics professor at Rice University. I remember the chalk dust, slate blackboards and marble hallways of his academic office, and the lasers and low-temperature gadgets in his laboratory. Paul took our family on sabbatical to Grenoble, France, where I attended 6th grade – a transforming educational experience.
For university, my mother Julia chose Princeton, a hotbed of bright and ambitious students (eg. Eric Lander!). My father’s advice was ‘learn computers’; so I developed data analysis software for the Statistics Department. John Tukey, inventor of the Fast Fourier Transform and the words ‘bit’ and ‘software’, was my undergraduate thesis adviser; Tukey advocated robust statistical methods, such as fitting equations by minimizing the l1 norm of residuals rather than the l2 norm. He criticized ‘classical’ mathematical statistics as searching for polished answers to yesterday’s problems.
After college, I worked in oil exploration research for Western Geophysical, and witnessed the seismic signal processing revolution driven by digital measurement and exciting computer algorithms. Ongoing experiments in blind deblurring and signal recovery from highly incomplete measurements were phenomenally successful. Experimentally, minimizing the l1 norm of the reconstructed signal (rather than the l2 norm of the residual) was miraculously effective. The strange interactions of sparse signals, undersampled data, and l1-norm minimization, inspired me (age 21!) to pursue ‘non-classical’ mathematics, to one day explain and use such phenomena.
At Berkeley during my postdoctoral and junior faculty years (1985-1990), I was in the Mecca of classical mathematical statistics, but I pursued my ‘non-classical’ interests. Iain Johnstone and I showed how to optimally ‘denoise’ sparse signals observed in noise, injecting ‘sparsity’ into top statistics journals. A sparse signal sticks up here and there above the noise, like daisies above weeds; our denoiser, based on l1-minimization, chops away the weeds while leaving the daisies. Jeff Hoch and Alan Stern successfully applied such ideas in Magnetic Resonance spectroscopy.
To publish ‘non-classical’ work on undersampled measurements of sparse signals, I turned to applied mathematics journals. Philip Stark and I found that a highly sparse signal could be perfectly recovered from randomly chosen (slightly) incomplete Fourier measurements, by l1-minimization on the recovered signal. Ben Logan and I extended Logan’s PhD thesis, and work of Santosa and Symes, to show that sparse signals missing low frequency information could be perfectly recovered, again by l1-minimization on the recovered signal. Finally, sparse signals missing high frequencies need really very few measurements, if the sparse signal is nonnegative.
‘Waveletsl’ swept over applied mathematics in 1988-1992, led by Yves Meyer, Ingrid Daubechies and Stephane Mallat. The wavelet transform could sparsify signals and images; I was inspired! Iain Johnstone, Dominique Picard, Gérard Kerkyacharian and I used wavelets to optimally remove noise from certain kinds of signals and images. The wavelet transform turned noisy images into sparse signals embedded in noise, and we could rescue the daisies from among the weeds, using l1-minimization.
Coifman and Meyer, and Mallat and Zhang, proposed in the early 1990’s to represent signals by combining several transforms – controversial to most, because the equations would be underdetermined, but inspiring to me. Scott Chen, Michael Saunders and I showed that l1 minimization (‘Basis Pursuit’) could often successfully find sparse solutions to such systems.
By the late 1990’s, I sought theory explaining such successes. Xiaoming Huo, Michael Elad and I gave ‘incoherence’ conditions on underdetermined systems guaranteeing that l1 minimization found sufficiently sparse solutions perfectly; soon, many others entered this area. In 2004, Emmanuel Candès and Terence Tao got dramatically stronger guarantees by interesting and deep mathematics, triggering a tidal wave of interest.
My 2004 paper ‘Compressed Sensing’ explained that, because the wavelet transform sparsifies images, images can be recovered from relatively few random measurements via l1 minimization. Michael Lustig’s 2007 Stanford Thesis used compressed sensing in medical imaging MRI, and, with co-authors, pushed MRI to solve seemingly intractable problems, where l1-minimization from undersampled measurements enables totally new application areas, like MRI movies of the beating heart, or MR spectroscopic imaging (revealing metabolism). In recent MRI conferences, compressed sensing became the most popular topic.
During 2004-2010, Jared Tanner and I discovered the precise tradeoff between sparsity and undersampling, showing when l1-minimization can work successfully with random measurements. Our work developed the combinatorial geometry of sparse solutions to underdetermined systems, a beautiful subject involving random high-dimensional polytopes. What my whole life I thought of privately as ‘non-classical’ mathematics was absorbed into classical high-dimensional convex geometry.
Arian Maleki, Andrea Montanari and I discovered in 2009 a new approach to the sparsity/undersampling tradeoff. Solving random underdetermined systems by l1-minimization was revealed as identical to denoising of sparse signals embedded in noise. Two separate threads of my research life became unified.
This brutally compressed account omits extensive work by many others, often more penetrating than my own, and much prehistory. I thank my mentors Peter Bickel, Raphy Coifman, Persi Diaconis, Brad Efron, Peter Huber, Lucien Le Cam and Yves Meyer, all my co-authors, students, and postdocs, and my wife Miki and son Daniel.
23 September 2013 Hong Kong