Research program
The public statement of the program: the question it pursues, the established results it builds on, how the work proceeds, and the ordered tests it has to pass. This is a working document. It changes when the work does, and revisions are recorded in the lab notes.
The question
Can the architectures of learning systems be derived from physical and mathematical law?
Derivation has a specific meaning here, fixed before any result is claimed. The assumptions are stated and countable. Every step can be checked by someone who is not its author. Established systems must appear as special cases when the general structure is restricted. And the result has to disagree with current practice somewhere a numerical experiment can settle. Anything weaker is an analogy, and analogies cannot carry the weight of a foundation.
Starting points
The program does not start from speculation. Each of its load-bearing components is established, published science.
- Computation is physical. Erasing one bit of information dissipates at least kT ln 2 of heat, roughly 3 × 10⁻²¹ joules at room temperature. There is no abstract computer; every inference has an energy cost with a known floor. Landauer, 1961
- Inference can be relaxation. Associative memory works as descent in an energy landscape. The computation is the physics of settling into a minimum. Hopfield, 1982
- Attention is an energy method. The update rule of modern Hopfield networks is, term for term, the attention mechanism of transformers. The dominant architecture of the current era is an energy model that was not recognized as one. Ramsauer et al., 2020
- Generation can be thermodynamics. Diffusion models were constructed directly on nonequilibrium thermodynamics: noising as entropy increase, generation as its learned reversal. Sohl-Dickstein et al., 2015
- Waves can compute. The wave equation maps onto recurrent computation, and an inhomogeneous physical medium can be trained to classify spoken vowels as the waves propagate through it. Hughes et al., 2019
- Learning can be physical too. Equilibrium propagation extracts correct gradients from a system's own relaxation dynamics, with no separate backpropagation pass. Scellier and Bengio, 2017
Read separately, these are six results in six subfields. Read together, they outline a claim that nobody has carried to completion: entropy, energy, and wave dynamics are not metaphors for learning systems. They are what learning systems are made of, and the field keeps rediscovering this one fragment at a time. The program exists to do the assembly deliberately: one framework in which these results are consequences, and from which new architectures follow.
Diffusion is the instructive case. It is the one major model family whose objective came from physics, and it became one of the strongest in use. But the physics stops at the objective; the networks underneath are still chosen by trial. The one time physics was allowed to choose, it chose well. The program pushes the same move the rest of the way down, from the loss function into the architecture itself.
Four pillars
The problem decomposes into the substrate, its mathematics, its physical constraints, and the higher-order structure above them.
Wave and energy-based computation
The substrate. Which inference operations can superposition, interference, and relaxation implement directly, and at what cost in energy, time, and capacity. What a learning rule looks like when it must be a physical process acting on the same field that performs the inference.
Mathematics of intelligence
The structure. What the information geometry of wave-derived model families looks like. Which complexity classes bound learning and inference on these substrates. Which topological properties of a representation are invariants of the dynamics rather than accidents of training.
Physics of computation
The constraints. How far practical inference sits above the Landauer floor, and what closing the gap requires. What reversibility buys a learning system. Whether fluctuation theorems yield usable bounds on learning dynamics far from equilibrium.
Self-reference and higher-order structure
The ceiling. Which formal structures let a system model its own computation without paradox. What reflection costs. Whether the obstructions known from logic, fixed points and incompleteness, reappear as physical constraints in systems that reason about themselves.
Method
The working loop is short and it repeats.
Derive. Pencil-and-paper work under stated assumptions. The output is a model class or an inference procedure with its preconditions visible, never a mechanism with a story attached.
Simulate. Small-scale numerical experiments on field and wave dynamics. Their job is to break derivations cheaply and early, before anything is built on them.
Build. Whatever survives both is implemented as a reference system and run on real tasks. A result that cannot be implemented is recorded as incomplete, not announced as progress.
The sequence of tests
The program is judged against an ordered sequence. Each step can fail, and a failure invalidates everything after it. Progress is recorded in the lab notes. Nothing on this page claims that a step has been passed.
-
T1Define the substrateA formal definition of the computational medium: its state space, its dynamics, and what counts as computation in it. Precise enough that the later steps can fail.
-
T2Recover what worksAt least two established model families must drop out as limiting cases: associative memories from energy relaxation, diffusion from entropy flow. A foundation that cannot reproduce known successes is wrong.
-
T3Predict a divergenceThe framework must disagree with current practice somewhere a numerical experiment can settle, before any larger system is built on it.
-
T4Run something derivedAn inference or learning procedure obtained from the theory, implemented, and measured on a task that was not chosen to flatter it.
Outputs
Lab notes
Dated working notes: positions, method decisions, readings, and negative results, published as they are written.
Technical notes
Formal write-ups released when a derivation or an experiment survives scrutiny. None are published yet. The first will appear when there is one worth reading.
Reference implementations
Code for any procedure the theory produces, released so that anyone can rerun the result instead of taking it on trust.