Top positive review
5.0 out of 5 starsA pedagogical and practical perspective on (modern) probability
Reviewed in the United States on December 1, 2018
Vershynin's book covers a set of topics that is likely to become central in the education for "modern" mathematicians, statisticians, physicists, and (electrical) engineers. He discusses ideas, techniques, and tools that arise across fields, and he conceptually unifies them under the brand name of "high-dimensional probability".
His choice of topics (e.g., concentration/deviation inequalities, random vectors/matrices, stochastic processes, etc.) and applications (e.g., sparse recovery, dimension-reduction, covariance estimation, optimization bounds, etc.) delivers a necessary (and timely) addition to the growing body of data-science-related literature—more on this below.
Vershynin writes in a conversational, reader-friendly manner. He weaves theorems, lemmas, corollaries, and proofs into his dialogue with the reader without getting caught in an endless theorem-proof loop. In addition, the book's integrated exercises and its prompts to "check!" or think about "why?" are strong components of the book. My copy of the book is already full of notes to myself where I’m “checking” something or explaining “why” something is true/false. (Also, as an aside, I love that coffee cups are used to signal the difficulty of a problem—good style.)
I want to highlight a few examples where Vershynin’s choice of topics and his prose shine brightly. In section 4.4.1, he guides us through an example that clearly illustrates the usefulness of ε-nets for bounding matrix norms. I’d seen ε-nets and covering numbers before, but never had good intuition for why they showed up in a proof.
Similarly, I’d struggled to gain intuition about why/how Gaussian widths and Vapnik–Chervonekis dimension capture/measure the complexity of a set. After reading sections 7.5 and 8.3 and working through some exercises, the two concepts are much clearer. Moreover, Vershynin connects these ideas back to covering numbers, which helped me better my understanding of all three concepts.
Finally, I found the discussions on chaining and generic chaining in chapter 8 to be excellent. Following them up with Talagrand’s comparison inequality, which becomes the hammer of choice for the matrix deviation inequality (in chapter 9), rounds out a long, but very valuable/useful chapter—and one that I’ll certainly re-study and reference.
I would recommend this book for those interested in (high-dimensional) statistics, randomized numerical linear algebra, and electrical engineering (particularly, signal processing). As I'm coming to realize, the "concentration of measure" and “deviation inequality” toolbox is essential to these areas. Lastly, I believe that this book makes a great companion to “Concentration Inequalities” by Boucheron, Lugosi, Massart.