Research Profile: Thermodynamics of prediction – Information and Computer Sciences

Professor Susanne Still discusses her work at the intersection of physics and computer science.

Hi Susanna. Your research was recently featured in Nature Magazine. That’s very exciting. In layman’s language, what is your work on “thermodynamics of prediction” about?

This is theoretical work that shows the fundamental equivalence between thermodynamic inefficiency, measured by dissipation, and modeling inefficiency, measured by nonpredictive information (or “nostalgia”). The results highlight a profound connection between the effective use of information and efficient thermodynamic operation: any system constructed to keep memory about its environment and to operate with maximal energetic efficiency has to be predictive.

The results hold far from thermodynamic equilibrium, and are thus applicable to a wide range of systems, in particular living systems, such as biomolecular machines, but also to artificial computing hardware, such as next generation nano computing devices. For example, we used the main result to provide an extension of Landauer’s principle, deriving a tighter bound on the minimum amount of heat that has to be generated when one bit of information is erased. This has direct consequences for the design of future nano computing devices. Also, we can exploit our discoveries about the physics of information processing directly in the design of novel machine learning algorithms.

The results also point towards interesting foundational issues. I am fundamentally interested in the physical principles that underly the emergence of intelligent information processing. Life happens away from thermodynamic equilibrium. The implication that achieving thermodynamic efficiency away from equilibrium requires maximal predictive power, at fixed model complexity, i.e. predictive inference, is an interesting starting point. If the efficient use of energy provides living systems with a competitive advantage, then predictive inference emerges as the information processing strategy of choice. Predictive inference is observed in biological systems on many levels, ranging from motor behavior to higher cognitive function, and may constitute a distinctive information processing strategy by which biology sets itself apart from the rest of the universe.

Your research applies to a surprisingly diverse set of fields: finance, neuroscience, and machine learning, to name a few. How did that happen?

Yes, and the newest addition is a paper about volcanoes on one of Jupiter’s moons, Io, the most volcanically active object in our solar system. This may seem eclectic, but it has a common threat, namely my interest in learning and adaptation. While learning and adaptation are defining characteristics of living systems, increased knowledge about information processing in biology has inspired a whole generation of computer scientists to build machines that can learn, algorithms that can extract structure from raw data and use it to make predictions. Naturally, some of these algorithms are useful in a large variety of areas.

Perhaps more importantly, work in contemporary statistics has contributed fundamental insights to the situation in which one has to make due with learning from limited data. This is relevant also when there is a lot of data, but the dimensionality of the problem is comparable in size to that of the data set. Examples can be found in many areas, such as bioinformatics, system biology, high energy physics, finance, meteorology and climate studies.

In this situation we can not blindly apply classical statistical methods, but rather have to pay attention to the concept of complexity control: If we allow our models to become arbitrarily complex then they can explain anything, but we loose the ability to delineate the signal from the noise as every fluctuation seems relevant. Obviously this is not a good idea and leads to what is commonly known as “over-fitting”. The complexity of the model thus has to be controlled. This is relevant for basically all complex systems. In the finance papers, we have shown that certain disturbing instabilities go away when these methods are applied properly, and we have given an intuitive financial derivation for the algorithm, together with the statistical argument.

Obviously our brains are very good at these tasks, take for example object and speech recognition. A lot of inspiration for machine learning has thus come straight from neuroscience, such as neural networks. Beyond this bio-mimicing, I am interested in the physical foundations of information processing, hence my interest in computational neuroscience.

Your training is as a physicist. What are some of your favorite insights from physics that you believe will make an impact on computer science in the coming years?

First of all, it is important to appreciate that ideas originating from physics have been driving computer science from the very beginning of the discipline. Some of the pioneers had interdisciplinary backgrounds and interests, often in physics and mathematics. The first computers, for example, were built by physicists and mathematicians. The first digital computer was built by the physicist John Atanasoff. Alan Turing and the mathematician and physicist John von Neumann competed in designing the first “modern” computer. Physicist John Bardeen co-invented the transistor. Significant contributions to computer science came from Information Theory in the early days. Information Theory was not only inspired by concepts from physics, such as Entropy, but also is tightly intertwined with statistical mechanics, and other areas in physics, including cosmology. To quote the Foundational Questions Institute: “The past century in fundamental physics has shown a steady progression away from thinking about physics, at its deepest level, as a description of material objects and their interactions, and towards physics as a description of the evolution of information about and in the physical world.” Physics and information processing are deeply connected, and to explore the precise nature of this connection is one of the great remaining challenges of our time.

If students wished to get involved with this research, what would you recommend that they do to prepare themselves?

Any interested student can come to my lectures, to the meetings of the Machine Learning Lab, or to see me personally. I am proud to have had many successful students from a variety of disciplines. Naturally, strong CS skills combined with a background in science (specifically physics, mathematics or theoretical chemistry) constitutes the ideal preparation for this line of work. However, in my view, curiosity is the great driving force that fuels research. I am really happy to report success stories of students from disciplines with traditionally little or no mathematical background, such as Linguistics. Other highly successful students have come from Astronomy, Communication (CIS program), Economics, Geosciences, Mathematics, Physics, and, of course, Computer Science. Successes include students publishing papers on the basis of class projects in my courses, students moving on to places such as Stanford and NASA for PhD or Post-doctoral studies, or finding faculty positions after completing their PhD.

Any other thoughts?

Lots! 😉