Jamie Simon

I'm a 5th-year PhD student in the physics department at UC Berkeley, aiming to use tools from theoretical physics to build fundamental understanding of deep neural networks. I'm advised by Mike DeWeese and supported by an NSF Graduate Research Fellowship. I'm also a research fellow at Imbue. In my free time, I like running, puzzles, spending time in forests, and balancing things.


Research


Reverse engineering the neural tangent kernel

A first-principles method for the design of fully-connected architectures

Much of our understanding of artificial neural networks stems from the fact that, in the infinite-width limit, they turn out to be equivalent to a class of simple models called kernel regression. Given a wide network architecture, it's well-known how to find the equivalent kernel method, allowing us to study popular models in the infinite-width limit. In work with Sajant Anand, we invert this mapping for fully-connected nets (FCNs), allowing one to start from a desired rotation-invariant kernel and design a network (i.e. choose an activation function) to achieve it. Remarkably, achieving any such kernel requires only one hidden layer, raising questions about conventional wisdom on the benefits of depth. This allows surprising experiments, like designing a 1HL FCN that trains and generalizes like a deep ReLU FCN. This ability to design nets with desired kernels is a step towards deriving good net architectures from first principles, a longtime dream of the field of machine learning.

ICML '22 [arXiv] [code] [blog]


Benign, tempered or catastrophic: a taxonomy of overfitting

How bad is neural network overfitting?

Classical wisdom holds that overparameterization is harmful. Neural nets defy this wisdom, generalizing well despite their overparameterization and interpolation of the training data. How can we understand this discrepancy? Recent landmark papers have explored the concept of benign overfitting -- a phenomenon in which certain models can interpolate noisy data without harming generalization -- suggesting that that neural nets may fit benignly. In this work with Neil Mallinar, Preetum Nakkiran, and others, we put this idea to the empirical test, giving a new characterization of neural network overfitting and noise sensitivity. We find that neural networks trained to interpolation do not overfit benignly, but neither do they exhibit the catastrophic overfitting foretold by classical wisdom: instead, they usually lie in a third, intermediate regime we call tempered overfitting. I found that we can understand these three regimes of overfitting analytically for kernel regression (a toy model for neural networks), and I proved a simple "trichotomy theorem" relating a kernel's eigenspectrum to its overfitting behavior.

(2022) In Submission [arXiv]



A theory of the inductive bias and generalization of kernel ridge regression and wide neural networks

A conservation law framework

Of all the many mysteries of modern neural networks, perhaps the greatest is the question of generalization: why do the functions learned by neural networks generalize so well to new data? "Why" questions can be difficult to pin down, so in recent joint work with Maddie Dickens, we took up a more scientific question: can we predict, from first principles, how well a given net will generalize on a given function? It turns out that recent breakthroughs have essentially answered this question for kernel ridge regression, furnishing accurate approximations for several key metrics of neural network generalization. In this work, we give a simpler derivation of these important results, ascribe a new interpretation to the final theory, and discuss several new applications of the theory to shed light on various aspects of neural network generalization, including double descent and the hardness of the classic parity problem. Key to our formulation is a simple conservation law, latent in kernel regression, which limits the total "learnability" of any orthonormal basis of target functions. Our theory is transparent enough to give intuitive insights into when and why a neural network generalizes well.

(2021) In Submission [arXiv] [code] [blog]


Critical point-finding methods reveal gradient-flat regions of deep network losses

Exposing flaws in widely-used critical-point-finding methods

Despite how common and useful neural networks are, there are still basic mysteries about how they work, many related to properties of their loss surfaces. In this project, led by Charles Frye, we tested Newton methods (common tools for optimization and exploring function structure) on loss surfaces. We found that, as opposed to finding critical points as designed, in practice Newton methods almost always converged to a different, spurious class of points which we described. Giving simple visualizable examples to illustrate the problem, we showed that some major studies using Newton methods on loss surfaces probably misinterpreted their results. Our paper is here.

(2021) [Neural Computation] [arXiv] [code]


Simplified Josephson-junction fabrication process for reproducibly high-performance superconducting qubits

A faster method to make Josephson junctions

In the spring and summer of 2019 I worked in the lab of Prof. Per Delsing developing nanofabrication methods for Josephson junctions, ubiquitous components in superconducting circuitry. My main project was a study of how junctions age in the months after fabrication, but my biggest contribution was elsewhere: Anita Fadavi, Amr Osman and I developed a junction design that is faster to fabricate by one lithography step, or potentially several days of work.

(2021) [Applied Physics Letters]


Fast noise-resistant control of donor nuclear spin qubits in silicon

Better control schemes for for spin qubits

Qubits decohere and lose their quantum information when uncontrollably coupled to their environment. Nuclear spin qubits in silicon are extremely weakly coupled to their environment, giving them long coherence times (up to minutes), but that same weak coupling makes quickly controlling them difficult. Advised by Prof. Sophia Economou, I came up with schemes for driving nuclear spin qubits that give fast, noise-resistant arbitrary single-qubit gates. The most important gate is a long sweep that effectively turns uncertainty in electric field (charge noise) into uncertainty in time, which can be accounted for by corrective gates. We also show two-qubit gates.

(2020) [PRB] [arXiv]


Puzzles

While a senior in undergrad, I started a puzzlehunt called the VT Hunt with Bennett Witcher. It became a university tradition, with the 2019-22 VT Hunts each drawing 1000-2000 participants and raising money for charities, and I've stayed involved as a mentor. I've also helped concoct six other puzzle events starting in high school. A few of my favorite puzzles I've made are below. They're roughly ordered from easiest to hardest, so you can pick where to start.


Posts