Introduction to Algorithmic Information Theory

 

Suppose we want to transmit a data string x (for simplicity consider a binary string made of 0 and 1). Of course the simplest choice is that of transmitting x itself. However this is not the only possibility. If there exist a computer C and a program y for that computer, such that C(y)=x , we can transmit y rather than x. If we consider C as fixed ( the transmitter and the receiver, use the same kind of computer) we can define the minimal descriptional complexity of x, given y, by the following expression:

min(|y|):C(y)=x.

The above definition is rather obvious and does not need to be further discussed.

Assume that we want to define the complexity of a string x, independently of the specific computer C. In this case we want to capture an intrinsic property of the string x, independent on all other parameters.. To obtain such an absolute measure, we need to make use of some prototype machine implementing C; at the same time however, this reference machine must be such that all possible machines (at least those now known ) can be simulated by such a reference device. The most general known machine is an abstract one, called Universal Turing Machine. We do not explain here what a  UTM is. Your personal computer is however a UTM. What is important here is understanding that there exists such reference machine.  Starting from this intuition, Kolmogorov was able to introduce an absolute measure of the complexity of a string as  the size (in bits) of the shortest program which, without additional data, can compute the string x and then stops [1]. Therefore we can write:

K(x) = min|p|:UTM(p)=x

The program is measured in bits. In his first version, Kolmogorov defined a simpler concept that can be called conditional complexity.

K(x|y) = min|p|:UTM(p,y)=x

where now we consider the shortest program that can produce x, given the other string y. It is evident that if y has nothing to do with the string x, the conditional complexity will be equivalent to the complexity of x, that is:

K(x/y)=<K(x)

Kolmogorov complexity is easily understood if we compare two typical strings:

p is an infinite string of apparently random digits, however such string can be actually evaluated by a UTM by a very short program (a string of bits. Its complexity is then very small). While the string can grow forever, its complexity grows at a much smaller rate.

On the contrary, a genuine random binary string is just described by itself (you have to transmit the whole string as there are no computer programs capable of reproducing it making use of a shorter program . As the string grows, its description grows as well. Then K(x) = O(|x|) (K(x) grows as fast as the length of x).

K(x) is in general not-computable, as we cannot be sure a program will halt when we are testing it. This is connected with the well known Godel Theorem.

To get some iidea about Godel's theorem, you can try to consider the "Berry Paradox":

``the first positive integer that cannot be specified in less than a billion words''.

For a discussion of the Berry paradox see here

We consider here the notion of randomness applied first to finite sets. Consider an element of the set D of all binary strings of a given length n. The cardinality #D of such set (the number of elements belonging to the set) is equal to  #D=2n. We need log2#D bits to encode the “name” of each of the members. This represents somehow the worst situation from the point of the information theory. It corresponds to the case where, to transmit a given element of D, you need to send a message like : "I mean the nth element of the set D"”. You have then to pass all the information about the set D (you can still transmit it making use of some UTM program (= K(D)). If however the receiver already knows that the string he/she will receive belongs to D, you do not need to transmit it. We can then write:

We can refer to this example to understand if low complexity strings are rare or abundant. We have 2n strings and we can evaluate their probability have a complexity m with m<n.  Such a probability can be written as:

for the simple reason that there exist just 2m different binary strings (therefore 2m possible UTM programs). So we can conclude that low complexity strings are quite rare. P(n,m) represents the probability that a string of length n has a complexity m.

It is interesting to underline that the above discussion clarifies how the complexity of a string can as well be evaluated by factorising it into two terms:

K(x)= K(x/D)+K(D)

Gell-Mann and Lloyd [2] have widely analysed this form of the complexity and have developed a very interesting point of view, which captures more precisely our intuitive concept of complexity of a string. Here you find a short introduction to their approach.

We have shown that strings such that K(x)<<l(x) are very rare. We can make use of this observation to infer some form of probability of a given string. A string can be produced by a large number of different programs. Anyhow, for the vast majority of strings, such programs will have a length near to the original string's length. Referring essentially to these observations, Solomonoff introduced the following probability evaluation for a given string [3]:

 

Here Ci are all the programs capable of producing x and l(Ci) are their lengths. P(x) is called the Algorithmic Probability of the finite string x. The sum is actually not-computable; this is due to the fact that it is impossible to verify in finite times, whether a particular string is a description of x or not. The best we can do is to approximate Solomonoff equation  by the largest lower bound of P(x) ,that can be demonstrated in a definite time T. This is usually done by finding a short code for x which gives terms summing to that bound. It is evident that, at least formally, K(x) represents the largest contribution to the above sum. So a reasonable estimate is 2^(-K(x)).

References

[1] Kolmogorov A. N., (1965), Three approaches to the quantitative definition of information, Prob. Inform. Transmission, 1, pp:4-7

[2] Gell-Mann M., Lloyd S., (1996), Information Measures, effective complexity, and total information, Complexity, pp:44-52

[3] Solomonoff R. J., (1964), A formal theory of inductive inference, I, II, Inform. Control, 7, 1, 22, pp:224-254

Next Page

Home Page