A thread on our new paper Thermodynamic Bayesian Inference
250 years later, Bayes’s theorem is still the gold standard for probabilistic reasoning. But for complicated models it’s too hard to implement exactly, so approximations are used. For example, the complexity of Bayesian
Enter thermodynamic computing. In this preprint, Thermodynamic Linear Algebra (), we show that a system of coupled oscillators in contact with a heat reservoir can be used to solve linear systems in an amount of time proportional to the number of variables.
Finding the point where these three planes intersect is an example of a problem which has been studied for literal millennia: solving a linear system of equations. Billions of dollars have been spent on developing new hardware to solve this and related problems faster.
A basic fact in linear algebra is that an N-dimensional vector space can have at most N linearly independent vectors. Is there a better statement of non-orthogonality when there are more vectors than dimensions? This is the best I could derive:
Quantum parallelisim—the idea that quantum computers gain an advantage by “trying multiple possibilities at once”—is popular but controversial. In fact, quantum mechanics is not even necessary to try multiple possibilities at once: we can achieve this with just thermodynamics.
Thermodynamic algorithms are of growing interest, and our results represent the first rigorously-proven speedups for thermodynamic algorithms. The results also reveal a deep, and wholly unexplored connection between thermodynamics and linear algebra.
Some good points made here. Some of them are not specific to any one company and more about thermo computing in general, so I’ll try to respond to those:
unpopular opinion:
first off this whole idea of using analog circuits to accelerate ai workloads aint exactly new. ibm intel and others have been experimenting with that shit for years tryna leverage the noise tolerance of neural nets to boost efficiency
heres the thing analog
In this work “Thermodynamic Matrix Exponentials and Thermodynamic Parallelism” (), we show that the exponential of a matrix can be found in O(d^2) time using a thermodynamic computer, an improvement over known analog and digital methods which are O(d^3).
Thermo computing will absolutely lead to advantages in efficiency but we need to be careful about making promises like “trillions of times less energy” without a clear justification
The "Brain"
@Extropic_AI
is developing is one where each thermodynamic neuron learns a complex probability distribution, encoding it in an energy potential
Allowing the fastest possible learning path, using trillions of times less energy and operating millions of times faster
As the cost required to train AI models is exploding, the prospect of using physics-based hardware to bring down this cost is enticing. Our new paper () proposes Thermodynamic Natural Gradient Descent (TNGD), a novel method to perform second-order
We have also have experimentally demonstrated the inversion of a matrix on a real thermodynamic device with and without our error mitigation protocol; we find that the error mitigation method reduces the error by around 20%.
This improves over SOTA methods, which scale with the square of the number of variables. We also provide methods for finding the inverse of a matrix, solving the Lyapunov equation, and finding the determinant of a matrix, all using a single relatively simple device.
A key question in thermodynamic computing is whether thermodynamic algorithms can be “derandomized”, or replaced by deterministic algorithms with similar performance. Efforts to answer this question will greatly benefit from the work of Avi Wigderson, who contributed massively to
Last year Jinyoung Park and Huy Tuan Pham gave an elegant proof of the Kahn-Kalai conjecture. One special case of this result (which was previously known): draw N dots on a sheet of paper, and for each pair of dots, draw a line between them with probability p. If p>ln(N)/N, it is
These new thermodynamic algorithms are robust against noise, and some even depend on noise! The key to harnessing noise is the thermodynamic property of ergodicity, which holds that the time average of a fluctuating measurable quantity converges to its ensemble average.
What makes a set non-measurable? What can the axiom of choice actually be used for? I never really got this in math class, but here is a nice construction that helped me understand it
We have an exciting week ahead: Our work on
#ThermoComputing
will be featured at IEEE Conference on Rebooting Computing () in San Diego.
I will present Thermo AI:
@MaxAifer
will present Thermo Linear Algebra
I'm really curious about the implications for thermodynamic linear algebra (). Maybe this method can be used to give an even larger speedup than the classical thermodynamic algorithms for linear algebra primitives.
Nathan Wiebe opens the simulation session at
#SQuInT2023
explaining an “Exponential quantum speedup in simulating coupled classical oscillators”, starting with a reference to Grover’s *other* paper (w/Sengupta), “From coupled pendulums to quantum search”
History of physics can roughly be marked by different functional forms that are used to model things (e.g. Taylor series in the 18th century, Fourier series in the 19th, differential operators in the 20th). Will deep neural networks play a similar role in 21st century physics?
Analog chips are a lot of fun to learn about. They’re more like musical instruments than computers, humming along to some weird song. What we’re building will be the Stradivarius of computing.
Promising work. It seems that formulating diffusion models as equilibrium systems allows for a natural description of phase transitions. Curious to learn more about the significance of the critical exponents in this context.
Happy to share:
"The statistical thermodynamics of generative diffusion models"
I describe diff. models in terms of Boltzmann distributions, order parameters and equations of state.
hase transitions and critical scaling in the generative process!
Seems like people like the book club idea! The first book will be solid state physics (ashcroft/mermin), I'll set up a space for next thursday to talk about chs. 1-4 (p. 1-84 in my edition). Like or DM if you want to join, I'll create a groupchat.
I'm gauging interest for weekly reading group on spaces. Books may include (in no particular order):
- Solid state physics (Ashcroft and Mermin)
- The art of electronics
- Nonequilibrium statistical physics (zwanzig).
Interested?
The work also points to a parallel between thermodynamic computing and quantum computing, which can also accelerate solving linear systems (HHL algorithm).
@quantum_aram
The speed-up is due to thermodynamic parallelism, as the correlation function of the system mimics a collection of d copies of the device. Unlike other analog methods, which are plagued by noise, ours relies on noise, and can achieve arbitrary accuracy in the presence of noise.
The circle of fifths is used to organize the 12 notes commonly used in western music, and is an instance of the cyclic group of order 12 (C12). I find another useful representation, using the group factorization C12 = C3 x C4, expresses the scale in terms of minor and major
We have exciting news to share 🔥
The team at
@NormalComputing
has performed the first thermodynamic linear algebra experiment. This is an experimental follow-up to our theory paper that came out in August. The details are in this blog, see thread below
Not to be a doomer, but energy discipline will really matter in the coming years, so energy-efficient computing is important. If global energy consumption were 100 times what it is today, even with zero carbon emissions, waste heat would be the same scale as the greenhouse effect
So, I just inverted my first matrix using thermodynamic linear algebra by simulating the circuit in HDL21
I've written up a colab notebook which generates the circuit from the matrix that you'd like to invert and recovers the inverse from simulation
Yes, analog design is hard. It can be done through careful simulation and prototyping. Circuits with linear dynamics are easier than nonlinear ones like neural networks. At
@NormalComputing
we’ve come up with useful algorithms that only require linear dynamics
But, like other analog computers, thermodynamic computers suffer from errors caused by imprecise parameter values. Here () we give a method for mitigating such errors on a thermodynamic device.
To your point about sampling: a Thermo computer is not just a PRNG, because it draws random samples from a specific distribution, and replaces algorithms like SGHMC or Langevin algorithms. There are also Thermo algorithms with deterministic outputs like the TLA algorithms
The earliest computers were analog, such as the Antikythera mechanism, lost in the Mediterranean around 100 BC. Like Antikythera, analog computers have mostly become historical artifacts, replaced by faster and more precise digital devices.
Why do we need thermodynamics to design energy-efficient computers? Because in other physical theories energy is conserved so it cannot be spent. Efficiency must be understood in terms of the thermodynamic resource of free energy.
Great question. The short answer is that minimal amount of energy needed depends on how fast you want to invert the matrix. If you don’t mind waiting a long time for the result, you could theoretically make the required energy arbitrarily small. In other words there is a trade
@MaxAifer
Ah, so you have to put energy into the system. Do you theoretically know for simple examples like a matrix inversion how much energy must be dissipated in terms of heat? I’ve heard that for Carnot engines you can calculate the maximum efficiency
Thanks everyone who came to the space last night, it was fun. I’m planning on doing another one tonight starting around 10pm EST. I’m thinking we’ll choose one of the following articles as a starting point:
1. Statistical physics of self-replication
2.
Variability of analog components is another valid concern. Work is being done on methods for mitigating this and other sources of error, see e.g. our thermies error mitigation protocol . This method has also been demonstrated experimentally.
Let k_n be the number of prime factors of n. For each n, take a step left if k_n is odd, or right if k_n is even. Your distance from where you started is roughly the same as if you had chosen each step by flipping a coin, if and only if the Riemann hypothesis is true.
Download EE Times' Silicon 100, our prestigious annual compilation of electronics and semiconductor startups that are shaping the future.
#silicon100
#eetimes
There are some interesting questions around derandomization in thermodynamic computing! Given Wigderson’s recent Turing award, it seems like a good time to talk about it
An elliptic curve is a set of points (x,y) with y^2 = a x ^3 + b x + c. Between any two points A and B on the curve, a straight line can be drawn, which (almost always) intersects the curve at a third point C. This relationship can be used to define an abelian group whose
The second law of thermodynamics and Landauer's erasure principle are formulated for noisy optical polarizers, providing fundamental new advances on the thermodynamic description of quantum communication devices.
@quthermo_comp
it’s true digital tech is very well optimized. That’s why we have a pretty clear picture of where its limits are in terms of speed and energy efficiency; because so much has been invested in reaching those limits. Energy efficiency can be improved by orders of magnitude using
Quantum parallelisim—the idea that quantum computers gain an advantage by “trying multiple possibilities at once”—is popular but controversial. In fact, quantum mechanics is not even necessary to try multiple possibilities at once: we can achieve this with just thermodynamics.
Starting to think I may need to build a binary search to figure out which one piece episode I was on using references to non-spoiler plot details—anyone working on this?
Coming from a totally theoretical background, there’s something really magical about working closely with experimentalists/engineers. It’s one thing to dream up these thought experiments, but somehow it’s always surprising when something actually works irl.
Unlike circuit QED, Thermo computing does not require us to have signal ranges on the order where the Heisenberg uncertainty principle becomes relevant (it also doesn't require very low temperatures in general). We can get a lot of mileage out of existing models for verification
Why do spirits like whisky taste more alcoholic at warmer temperatures? Chemists might have found the answer in the shapes formed by water and ethanol molecules at different temperatures and alcohol levels.
In this paper (), we show that a continuous-variable quantum mode can be put into a state whose Wigner function encodes an elliptic curve. This can be done using a cubic potential energy function realizable on near-term SNAIL devices.
My second paper ever! How much heat is dissipated in erasing quantum information stored in light? Many thanks to coauthors
@quthermo_comp
and
@NathanMMyers1
!
Inbound to U. Chicago. Seminar tomorrow (Wednesday) at 12 noon, "Thermodynamic Linear Algebra". I currently have some unscheduled time Thursday afternoon.
The matrix exponential is often defined by a power series, and appears (often unexpectedly) in various branches of math. Below are a few of its many equivalent expressions.
The algorithm can be implemented on a simple electrical device that could be made from cheap off-the-shelf parts. To get the matrix exponential, we just let the circuit come to thermal equilibrium and then measure the correlation function.
We have also have experimentally demonstrated the inversion of a matrix on a real thermodynamic device with and without our error mitigation protocol; we find that the error mitigation method reduces the error by around 20%.
I'm gauging interest for weekly reading group on spaces. Books may include (in no particular order):
- Solid state physics (Ashcroft and Mermin)
- The art of electronics
- Nonequilibrium statistical physics (zwanzig).
Interested?
One thought on this: For a machine where one program can get swapped out for another (eg a computer), the length of a program tells us something about the complexity of the resulting behavior. This is because useful behaviors roughly get Shannon coded to short programs in a good
Surprising how difficult it is for some to grasp very basic notions of information theory.
Arguments like "8MB is actually more information than 8MB" seem to me like arguments that demonstrate the possibility of perpetual motion.
Is it easier to guess the end of a story from the beginning, or to guess the beginning from the end?
@stokhastik
are talking about this question and its relevance to AI and thermodynamics, feel free to join!
Thermodynamic computers can solve some problems faster than digital computers, for example in linear algebra (), perhaps signaling a "zombie comeback" of analog computing ().
According to
@stephen_wolfram
, the second law of thermodynamics is really a statement about the computational limits of humankind. Will thermodynamics and computational complexity theory eventually be unified into a single theory?
Looks like the most popular topic is the article “computational foundations for the second law of thermodynamics” by
@stephen_wolfram
so we can use that as the starting point for the space tonight
Thanks everyone who came to the space last night, it was fun. I’m planning on doing another one tonight starting around 10pm EST. I’m thinking we’ll choose one of the following articles as a starting point:
1. Statistical physics of self-replication
2.
A question that came up in the last space I hosted was whether there is something like a central limit theorem for composition of randomly chosen group elements. For example, if you sample g_1, g_2, … g_N randomly from some distribution over a Lie group and then take the product
The classic resolution to Maxwell’s demon is that the computations the demon do produce entropy offsetting any decrease in entropy. Can the demon can learn to do its job more efficiently? Maybe worth studying “neural Maxwell’s demon” that considers the thermal cost of learning.
This scheme eliminates first order dependence of the error on imprecision of hardware components, meaning thermodynamic algorithms can be made insensitive to this source of error. This result is proven in our proposition 1, and also borne out by numerics. These algorithms are
Error mitigation is a central to performing useful quantum computations on near-term hardware (, ). Similarly, we believe that effective error mitigation is a key to unlocking high-performance thermodynamic computing. Therefore, this
Bridging principles between physics and AI will result in new ideas that work well. We present our work on neural CDEs and continuous-time (CT) U-Nets. Our ideas are inspired by
@PatrickKidger
's work. Find our paper here: and see 🧵for a quick summary.
The matrix exponential gives the solution to a linear dynamical system, and problems of this form appear in almost all quantitative sciences. These systems model phenomena such as the motion of a mass on a spring, population growth, or the evolution of a quantum computer’s state.
Our algorithm runs on hardware that samples a normal distribution with imprecise parameters. A statistical mixture of bad approximations to a target distribution yields a good approximation. The optimal mixture is found using linear interpolation on a lattice of neighbors.