Poorly-Behaved Kolmogorov Complexity

With some speculation on biology

Oct 21, 2024

I have a much longer article I’ve been working on lately on the very fascinating subject of Kolmogorov Complexity and some observations I’ve made about its relationships with language and art, though that article is getting long and I’ve been busy with other work. I actually have some ideas for several different articles on KC and its relationship to a variety of more familiar subjects.

So I have here a much shorter article on some odd properties it has that seem to get less attention and some speculation on how they may be explanatory in biology.

Kolmogorov Complexity

Kolmogorov Complexity is ultimately a measure of how compressible information is. The KC of a given string of bits is equal to the shortest possible computer program whose output is that bit sequence.

There are several factors that complicate things here. For one, the precise complexity of a given string is extremely dependent on the language used to write the program, and thus this measure of complexity is always relative to a specified language.

There is a type of recreational programming called “code golf” that illustrates this dependency well, where people compete to write the shortest program to solve a particular problem. Some languages are more expressive than others, some are better tuned to specific problems than others. Various “code golf languages” have been developed, which often turn popular code golf challenges into single-character operations. There are many such languages where a “hello world” program is simply an empty source file, thus having zero length.

There are also information-theoretic limits. N bits of information can only encode a maximum of 2^N possible strings, and no amount of fancy language trickery changes this. As a result, if you pick completely random strings of N bits, only a very tiny proportion of them can possibly be encoded with very short programs, some may turn out longer than N bits because of overhead of the language (a program that simply prints a very long string needs some way to encode the print statement itself, or at least to differentiate between a string to be printed and a string literal to be used as an input for further computation), and on average the KC of such strings must always be N or greater.

Then there is the problem of the generality of the language itself. For maximally optimal data compression, a Turing-complete language is theoretically desirable. However, this is also where things get tricky. Determining whether a Turing-complete program has a particular kind of behavior, even as simple as determining if it ever stops running, is in general undecidable. Searching over the space of undecidable programs to find one that is in some sense optimal makes the problem harder, and therefore computing the true Kolmogorov Complexity of a string given a Turing-complete language is not merely undecidable, but actually turns out to be unimaginably worse - fundamentally incomputable.

In general practice however, understanding the behavior of programs is not nearly as bad as it is in theory, mostly because only a subset of programs are actually impossible to understand, and humans tend to build things that have convenient properties and are meant to be understood. If your program has behavior that is actually impossible to verify, that’s a very good indicator that you have a bug somewhere and it is doing something it is not meant to do.

However, in the case of KC we don’t care about engineering a program to be understood, we care about minimizing code size at absolutely all costs, and therefore all the bizarre, random, badly-behaved programs that no human in their right mind would ever write remain as options, unable to be dismissed. Kolmogorov complexity in its true form is not something that can be measured or studied directly, but instead is always a mere approximation of some deeper, inaccessible form of complexity.

One way around this is that we can give up on Turing-completeness. This means there will be certain types of structure in strings that will now be inexpressible, but we may at least be able to constrain our search to programs that we can make certain practical guarantees about. This sacrifices optimality but makes the problem much more tractable. If we assume that most of our data probably has a fair bit of random noise anyway, for example in the case of compressing images or other media files that sample the real world, such bizarre structures may be exceedingly rare and well worth compromising on.

A problem I haven’t mentioned yet is that, if we wish to compress a string, we must have some algorithm for converting the string into a program that actually generates it. Constrained languages can make this problem far more tractable than it would be in practice.

Technically speaking, all the classical methods of encoding compressed data - Huffman coding and the like - are languages, just highly constrained ones that are often very far from being Turing-complete. Methods like run-length encoding which encode simple looping constructs, and dictionary coders which can vaguely resemble function calls if you squint hard enough, are perhaps more obvious examples of this.

The Poorly Behaved Stuff

For many problems that have interested humans for the past century or so of computing, compression based on relatively well-behaved languages has been a common workhorse. But Kolmogorov Complexity promises that even greater treasures lie in the less well-behaved languages. Perhaps the problems we have chosen to focus on so far haven’t found a need for them, but as far as human endeavors go a century is a blink of an eye. The deep, distant future of computing will contain great wonders that the most observant of us can only see the faintest hints of today.

If we consider the inverse of compression and ask about “inflation algorithms”, it’s pretty easy to write very short programs that produce very long, complicated strings. Running certain cellular automata for very long, but finite periods of time, for example. An unimaginably more extreme example may simply be evaluating the Ackermann function on inputs greater than four.

In much the same way that a gigantic tree can grow from a very tiny seed, a very large, complex structure can unfold from a very small and simple seed program.

Kolmogorov Complexity is in a certain sense not only a measure of compressibility, but a measure of the amount of structure in a problem.

Assuming P!=NP, then the solutions to most optimization and constraint problems are fundamentally exponentially difficult to compute. However, these problems are in a certain sense highly optimal with respect to problem definitions that are exponentially simpler than the process required to resolve them.

The Most Profound Problem in Mathematics

Charles Rosenbauer

November 15, 2022

Read full story

This is a place where Kolmogorov Complexity and other forms of computational complexity diverge somewhat; KC doesn’t care about how long it takes to compute something, only what gets computed. Such solutions are effectively both highly random and highly structured simultaneously.

Computing even a locally optimal solution to many of these hard problems can require only a fairly small amount of code - just the parameters to define the function to optimize over, and some function that navigates the solution space and converges on a solution eventually.

If your problem is continuous, simple gradient descent can work pretty well. For discrete problems, something as simple and stupid as a brute force iteration over the state space until a valid solution is found is very simple and technically works, though you may be waiting a long time. There are a number of much more sophisticated methods that can search spaces much more intelligently that don’t obviously involve any gradients. Of course, none of these methods - gradient descent included - make any real guarantees about how long it takes to converge onto a solution. There can be problems that are well-behaved and converge quickly, and there can be others that take an extraordinary long time.

A Simple Example

Say we define some space that has a number of local maxima. If we throw a simple gradient ascent algorithm at it, then just about any initial condition - all zeros, something random, whatever is most convenient to define - it will guide us to one of these peaks.

However, perhaps this is a problem where it is easy to accidentally end up in a very suboptimal local maxima. We don’t want to just climb to the top of just any old hill in this space, we want the tallest, or at least one of the tallest.

Around each peak there is some space of points that will naturally guide a gradient-following algorithm to that peak. We can slice the space into these subspaces. Some of these spaces may be bigger than others, some may have very simple shapes and others much more complex ones. If we wish to use our gradient-following algorithm to locate a very specific peak, we only need to have the algorithm start somewhere in the space around that peak.

A space with many peaks, divided into regions based on which peak each initial position guides a gradient-following algorithm to.

More precision in the location requires more information. Every additional bit allows us to narrow things to a window half the size. Perhaps the rest of the coordinates are all zeros, perhaps something random, but to get more specific, all else being equal, necessarily requires more bits.

However, overall, unless there is a gigantic range in the sizes of these separate regions and the one we aim to hit happens to be exceptionally small, the total amount of information required to specify which peak we aim to reach has far more to do with the total number of local maxima than it does the actual scale of the solution space. Once we have defined the structure of the space, the amount of information required to specify which optimal solution we wish to land on may be very tiny, perhaps only a small handful of bits.

Granted, with how long it may take to converge to a solution - perhaps we have a very large and complex space and gradient descent takes a very long and winding path to the top - then we may want more precision in our starting point to start a bit closer to the final solution. Kolmogorov Complexity at the end of the day is simply a theoretical ideal, with an infinite difficulty curve, and practical concerns always win in practice.

Some Wild Speculation on Biology

A chemical system will naturally trend toward equilibrium, and biological systems are ultimately just a digital language (DNA) that specifies and controls a very complex chemical system. Many of the problems that biological systems require to survive and compete for resources take the form of optimization or constraint-solving problems, and like the code golf languages that encode popular challenges with single characters (or even none at all), the chemical language underlying biology has optimization as its most primitive and fundamental tools.

Of course, biology is not completely without its concerns about the “computational” efficiency of reaching equilibria - it does care how long and how much energy it takes to find optimal solutions. It spends no shortage of resources catalyzing reactions with enzymes, accelerating protein folding with chaperones, and specifying solutions fairly directly with body plans and the like. But the point is that, given the tools biology has to work with, it appears that a tremendous amount of optimization should come for free as well.

This suggests that, while some things are relatively hardcoded, it is very plausible that a large amount of biology may simply be a set of evolved specifications of constraints, evolved specifications of tools for solving them, and a chemical system that naturally finds a very efficient way to solve the given constraints with the given tools. Of course, some problems may be particularly open-ended, with gigantic search spaces (optimal body plans for example), and in such cases it would be much faster to hardcode at least part of the solution to speed things up rather than waiting for some pink or green goo to search through absolutely every possibility.

The human body is incredibly complex, but is specified by only approximately 750 MB of DNA. Even then, that DNA also contains many very repetitive segments, and may be well over 90% “junk” DNA, with questionable contributions. There are only about 20,000 genes. Such enormous complexity will occasionally baffle people, with how much seems to be packed in there. The number of different behaviors that people claim are hardcoded into our biology and our brains probably exceeds the number of actual nervous system genes by a very wide margin, perhaps even eclipsing the number of total genes altogether. It seems to produce a very large number of diverse behaviors from a surprisingly small amount of code.

The Alien Technology in Plain Sight

Charles Rosenbauer

August 10, 2022

Read full story

Biology, viewed as a chemical computer, certainly does have many recognizable parts. DNA transcription closely resembles function calls, and there are a variety of proteins that act as repressors and promoters that allow genes to be switched on and off by other genes and other chemical signals. On the other hand, biology is in many respects a very alien form of computing, one unconstrained by human understanding and that has been around far longer than us. We should be very surprised if it is not exploiting very strange phenomena that require a much deeper understanding of computing to appreciate. Why would it not be exploiting very strange properties of Kolmogorov Complexity to squeeze more expressive power from its code? And by extension, a small mutation in DNA may as a result explode into a much more impactful change in function than it would in familiar, human-written code, while also being almost guaranteed to land in some kind of local optima, once the chemistry plays out.

Inputs from the environment would also count as part of the specification, and stochastic properties of the system would need to be factored in as well. In KC terms, this is like having a large portion of the input program be missing, absent from the DNA, and simply gathered as an input from the environment. The DNA is not the sum total of all the data required to construct this specific creature, but only the part that is not collected from the environment.

This also reminds me a bit of Richard Dawkins’ concept of the extended phenotype.

The Origin of Mimicry

Charles Rosenbauer

April 4, 2024

Read full story

If you read books from Dawkins’ crowd of evolutionary biologists, a common discussion is around how game theory equilibria may influence gene distributions in a population. There is a great deal of discussion around how natural selection is capable, after enough generations, of converging onto optimal solutions to tricky game theory problems. However, computing Nash Equilibria of finite games is NP-complete, and therefore is within the computational power of a sufficiently expressive chemical system.

There are occasionally signs that evolution seems to work much faster than these models of natural selection suggest. However, from the perspective I’ve outlined here, all of this computation spread across countless generations could, with a small handful of genes, simply describe the problem and allow the chemistry within a cell to solve it automatically. Natural selection would still play a very important role in learning which problems are most productive to solve, but a great deal of the difficulty would be offloaded onto the chemistry within an organism.

It’s also worth pointing out that constraint systems, which are expressive enough to describe any NP problem, have the property that additional constraint that do not add new degrees of freedom only narrow possibilities. If you have some genes that describe only some of the constraints necessary to solve such a problem, you may still converge on a valid solution by luck. You’ll at least end up with something that is locally optimal. Additional constraints, additional genes, simply narrow exploration of the search space to improve the reliability of landing somewhere good.

So not only should such a chemical constraint-solving system be extremely natural to express given the tools biology has, and not only does it explain why it can generate such a wide variety of effective behaviors well-adapted to its environment (input) from a relatively small amount of code, and not only should it be able to solve problems within a single generation that would otherwise take many generations to converge on purely by natural selection, but it also should in theory be very easy to learn all of this bit-by-bit, with something that resembles a fairly straightforward, if somewhat stochastic gradient.

So it appears that every cell in your body is likely a constraint-solving algorithm searching for the most optimal path forward with the tools and resources that natural selection has found it, and the exponential costs of this optimization problem are dealt with by each of the billions of chemical reactions per second per cell second contributing a calculation, occasionally with a hint from a chaperone or enzyme.

All of this is of course my own speculation, I’m no biologist, but it lines up with a lot of observations and explains quite a lot of things.

Bzogramming

The Most Profound Problem in Mathematics

The Alien Technology in Plain Sight

The Origin of Mimicry

Discussion about this post