Jamie Simon
I am a 3rdyear PhD student in the physics department at UC Berkeley, aiming to use tools from theoretical physics to build fundamental understanding of machine learning methods. I'm advised by Mike DeWeese and supported by an NSF Graduate Research Fellowship. In my free time, I like running, puzzles, spending time in forests, and balancing things.
Research
Neural tangent kernel eigenvalues accurately predict generalization
A firstprinciples theory of generalization for wide neural netsOf all the many mysteries of modern neural networks, perhaps the greatest is the question of generalization: why do the functions learned by neural networks generalize so well to new data? "Why" questions can be difficult to pin down, so in recent joint work with Maddie Dickens, we took up a more scientific question: can we predict, from first principles, how well a given net will generalize on a given function? It turns out we can  building off recent breakthroughs, we derive accurate approximations for several key metrics of neural network generalization. In the process, we prove a new "nofreelunch" theorem characterizing a fundamental tradeoff in the inductive bias of wide neural networks: improving a network's generalization for a given target function must worsen its generalization for orthogonal functions. Our theory is transparent enough to give intuitive insights into when and why a neural network generalizes well.
On the power of shallow learning
Reverseengineering the neural networkkernel method equivalenceMuch of our understanding of artificial neural networks stems from the fact that, in the infinitewidth limit, they turn out to be equivalent to a class of simple models called kernel methods. Given a wide network architecture, it's surprisingly easy to find the equivalent kernel method, allowing us to study popular models in the infinitewidth limit. In recent work with Sajant Anand, I showed that, for fullyconnected nets (FCNs), this mapping can be run in reverse: given a desired kernel, we can work backwards to find a network that achieves it. Surprisingly, we can always design this network to have only a single hidden layer, and we used that fact to prove that wide shallow FCNs can achieve any kernel a deep FCN can, an analytical conclusion our experiments support. This ability to design nets with desired kernels is a step towards deriving good net architectures from first principles, a longtime dream of the field of machine learning.
A phenomenological theory of highdimensional optimization
I'm currently working on a phenomenological fieldtheorybased model of highdimensional loss surfaces that'll hopefully faithfully capture several emprical features of real neural net loss surfaces. I aim for my model to explain the Hessian lossindex relationship, agree with findings that datasets have intrinsic dimensions, and shed light on mode connectivity. This research direction was the subject of my proposal for the NSFGRFP.
Alternative neural network formulations
One current direction of my research involves exploring variants on the classic neural network design that are still able to do complex deep learning tasks. Much evidence in the last few years indicates that the magic of neural networks lies in their hierarchical nonlinear structure, not in the lowlevel details of their mathematical formulation, so this project's aiming to explore other, new deep learning models to build an understanding of what the requirements are for a model to learn complex patterns. I've written up some of my progress on one variant here.
Critical pointfinding methods reveal gradientflat regions of deep network losses
Exposing flaws in widelyused criticalpointfinding methodsDespite how common and useful neural networks are, there are still basic mysteries about how they work, many related to properties of their loss surfaces. In this project, led by Charles Frye, we tested Newton methods (common tools for optimization and exploring function structure) on loss surfaces. We found that, as opposed to finding critical points as designed, in practice Newton methods almost always converged to a different, spurious class of points which we described. Giving simple visualizable examples to illustrate the problem, we showed that some major studies using Newton methods on loss surfaces probably misinterpreted their results. Our paper is here.
Simplified Josephsonjunction fabrication process for reproducibly highperformance superconducting qubits
A faster method to make Josephson junctionsIn the spring and summer of 2019 I worked in the lab of Prof. Per Delsing developing nanofabrication methods for Josephson junctions, ubiquitous components in superconducting circuitry. My main project was a study of how junctions age in the months after fabrication, but my biggest contribution was elsewhere: Anita Fadavi, Amr Osman and I developed a junction design that is faster to fabricate by one lithography step, or potentially several days of work.
(2021) [Applied Physics Letters]
Fast noiseresistant control of donor nuclear spin qubits in silicon
Better control schemes for for spin qubitsQubits decohere and lose their quantum information when uncontrollably coupled to their environment. Nuclear spin qubits in silicon are extremely weakly coupled to their environment, giving them long coherence times (up to minutes), but that same weak coupling makes quickly controlling them difficult. Advised by Prof. Sophia Economou, I came up with schemes for driving nuclear spin qubits that give fast, noiseresistant arbitrary singlequbit gates. The most important gate is a long sweep that effectively turns uncertainty in electric field (charge noise) into uncertainty in time, which can be accounted for by corrective gates. We also show twoqubit gates.
Puzzles
While a senior in undergrad, I started a puzzlehunt called the VT Hunt with Bennett Witcher. It became a university tradition, with the 2019, '20, and '21 VT Hunts each drawing 10002000 participants and raising money for charities. I've also helped concoct six other puzzle events starting in high school. A few of my favorite puzzles I've made are below. The easiest is Flags, and the most satisfying to solve, in my view, is Maelstrom.
Posts

Einstein vs. Bohr Rap Battle

Newton vs. Leibniz Rap Battle

A firstprinciples theory of neural net generalization

The gravitree

Could you propel a spacecraft using sports projectiles?

The principle of least power dissipation

Multiplicative neural networks

How would an upsidedown candle burn?

Messing with the postal service

Common ground

What would happen if you made a planet out of fish?

How hard do you have to hit a chicken to cook it?

The expected cost of breaking quarantine