I am a 3rd-year PhD student in the physics department at UC Berkeley, aiming to use tools from theoretical physics to build fundamental understanding of machine learning methods. I'm advised by Mike DeWeese and supported by an NSF Graduate Research Fellowship. In my free time, I like running, puzzles, spending time in forests, and balancing things.
Neural tangent kernel eigenvalues accurately predict generalizationA first-principles theory of generalization for wide neural nets
Of all the many mysteries of modern neural networks, perhaps the greatest is the question of generalization: why do the functions learned by neural networks generalize so well to new data? "Why" questions can be difficult to pin down, so in recent joint work with Maddie Dickens, we took up a more scientific question: can we predict, from first principles, how well a given net will generalize on a given function? It turns out we can - building off recent breakthroughs, we derive accurate approximations for several key metrics of neural network generalization. In the process, we prove a new "no-free-lunch" theorem characterizing a fundamental tradeoff in the inductive bias of wide neural networks: improving a network's generalization for a given target function must worsen its generalization for orthogonal functions. Our theory is transparent enough to give intuitive insights into when and why a neural network generalizes well.
On the power of shallow learningReverse-engineering the neural network-kernel method equivalence
Much of our understanding of artificial neural networks stems from the fact that, in the infinite-width limit, they turn out to be equivalent to a class of simple models called kernel methods. Given a wide network architecture, it's surprisingly easy to find the equivalent kernel method, allowing us to study popular models in the infinite-width limit. In recent work with Sajant Anand, I showed that, for fully-connected nets (FCNs), this mapping can be run in reverse: given a desired kernel, we can work backwards to find a network that achieves it. Surprisingly, we can always design this network to have only a single hidden layer, and we used that fact to prove that wide shallow FCNs can achieve any kernel a deep FCN can, an analytical conclusion our experiments support. This ability to design nets with desired kernels is a step towards deriving good net architectures from first principles, a longtime dream of the field of machine learning.
A phenomenological theory of high-dimensional optimization
I'm currently working on a phenomenological field-theory-based model of high-dimensional loss surfaces that'll hopefully faithfully capture several emprical features of real neural net loss surfaces. I aim for my model to explain the Hessian loss-index relationship, agree with findings that datasets have intrinsic dimensions, and shed light on mode connectivity. This research direction was the subject of my proposal for the NSF-GRFP.
Alternative neural network formulations
One current direction of my research involves exploring variants on the classic neural network design that are still able to do complex deep learning tasks. Much evidence in the last few years indicates that the magic of neural networks lies in their hierarchical nonlinear structure, not in the low-level details of their mathematical formulation, so this project's aiming to explore other, new deep learning models to build an understanding of what the requirements are for a model to learn complex patterns. I've written up some of my progress on one variant here.
Critical point-finding methods reveal gradient-flat regions of deep network lossesExposing flaws in widely-used critical-point-finding methods
Despite how common and useful neural networks are, there are still basic mysteries about how they work, many related to properties of their loss surfaces. In this project, led by Charles Frye, we tested Newton methods (common tools for optimization and exploring function structure) on loss surfaces. We found that, as opposed to finding critical points as designed, in practice Newton methods almost always converged to a different, spurious class of points which we described. Giving simple visualizable examples to illustrate the problem, we showed that some major studies using Newton methods on loss surfaces probably misinterpreted their results. Our paper is here.
Simplified Josephson-junction fabrication process for reproducibly high-performance superconducting qubitsA faster method to make Josephson junctions
In the spring and summer of 2019 I worked in the lab of Prof. Per Delsing developing nanofabrication methods for Josephson junctions, ubiquitous components in superconducting circuitry. My main project was a study of how junctions age in the months after fabrication, but my biggest contribution was elsewhere: Anita Fadavi, Amr Osman and I developed a junction design that is faster to fabricate by one lithography step, or potentially several days of work.
(2021) [Applied Physics Letters]
Fast noise-resistant control of donor nuclear spin qubits in siliconBetter control schemes for for spin qubits
Qubits decohere and lose their quantum information when uncontrollably coupled to their environment. Nuclear spin qubits in silicon are extremely weakly coupled to their environment, giving them long coherence times (up to minutes), but that same weak coupling makes quickly controlling them difficult. Advised by Prof. Sophia Economou, I came up with schemes for driving nuclear spin qubits that give fast, noise-resistant arbitrary single-qubit gates. The most important gate is a long sweep that effectively turns uncertainty in electric field (charge noise) into uncertainty in time, which can be accounted for by corrective gates. We also show two-qubit gates.
While a senior in undergrad, I started a puzzlehunt called the VT Hunt with Bennett Witcher. It became a university tradition, with the 2019, '20, and '21 VT Hunts each drawing 1000-2000 participants and raising money for charities. I've also helped concoct six other puzzle events starting in high school. A few of my favorite puzzles I've made are below. The easiest is Flags, and the most satisfying to solve, in my view, is Maelstrom.