Suppose we want to transmit a data string x (for
simplicity consider a binary string made of 0 and 1). Of course the simplest
choice is that of transmitting x itself. However this is not the
only possibility. If there exist a computer C and a program y
for that computer, such that C(y)=x , we can transmit y
rather than x. If we consider C as fixed ( the
transmitter and the receiver, use the same kind of computer) we can define the
minimal descriptional complexity of x, given y, by the
following expression:
min(|y|):C(y)=x.
The above definition is rather obvious and does not need to be further
discussed.
Assume that we want to define the complexity of a string x,
independently of the specific computer C. In this case we want to
capture an intrinsic property of the string x, independent on all
other parameters.. To obtain such an absolute measure, we need to make use of
some prototype machine implementing C; at the same time however, this
reference machine must be such that all possible machines (at least those now
known ) can be simulated by such a reference device. The most general known
machine is an abstract one, called Universal Turing Machine. We do not explain
here what a UTM is. Your personal computer is however a UTM.
What is important here is understanding that there exists such reference
machine. Starting from this intuition, Kolmogorov was able to
introduce an absolute measure of the complexity of a string as the
size (in bits) of the shortest program which, without additional data, can
compute the string x and then stops [1]. Therefore we can write:
The program is measured in bits. In his first version, Kolmogorov
defined a simpler concept that can be called conditional complexity.
where now we consider the shortest program that can produce x,
given the other string y. It is evident that if y has
nothing to do with the string x, the conditional complexity will
be equivalent to the complexity of x, that is:
K(x/y)=<K(x)
Kolmogorov complexity is easily understood if we compare two typical
strings:
p is an infinite
string of apparently random digits, however such string can be actually
evaluated by a UTM by a very short program (a string of bits. Its
complexity is then very small). While the string can grow forever, its
complexity grows at a much smaller rate.
On the contrary, a genuine random binary string is just described by itself
(you have to transmit the whole string as there are no computer programs capable
of reproducing it making use of a shorter program . As the string grows, its
description grows as well. Then K(x) = O(|x|) (K(x)
grows as fast as the length of x).
K(x) is in general not-computable, as we cannot be sure a
program will halt when we are testing it. This is connected with the well known
Godel Theorem.
To get some iidea about Godel's theorem, you can try to consider the "Berry
Paradox":
``the first positive integer that cannot be specified in less than a
billion words''.
For a discussion of the Berry paradox see here
We consider here the notion of randomness applied first to finite sets.
Consider an element of the set D of all binary strings of a
given length n. The cardinality #D of such
set (the number of elements belonging to the set) is equal to #D=2n.
We need log2#D bits to encode the “name”
of each of the members. This represents somehow the worst situation from the
point of the information theory. It corresponds to the case where, to transmit
a given element of D, you need to send a message like :
"I mean the nth element of the set D"”. You have then
to pass all the information about the set D (you can still transmit
it making use of some UTM program (= K(D)). If however
the receiver already knows that the string he/she will receive belongs to
D, you do not need to transmit it. We can then write:
![]()
We can refer to this example to understand if low complexity strings are
rare or abundant. We have 2n strings and we can evaluate
their probability have a complexity m with m<n.
Such a probability can be written as:

for the simple reason that there exist just 2m
different binary strings (therefore 2m possible UTM
programs). So we can conclude that low complexity strings are quite rare.
P(n,m) represents the probability that a string of length n
has a complexity m.
It is interesting to underline that the above discussion clarifies how
the complexity of a string can as well be evaluated by factorising it into two
terms:
K(x)= K(x/D)+K(D)
Gell-Mann and Lloyd [2] have widely analysed this form of the complexity
and have developed a very interesting point of view, which captures more precisely
our intuitive concept of complexity of a string. Here you find a short introduction
to their approach.
We have shown that strings such that K(x)<<l(x) are
very rare. We can make use of this observation to infer some form of probability
of a given string. A string can be produced by a large number of different
programs. Anyhow, for the vast majority of strings, such programs will have
a length near to the original string's length. Referring essentially to these
observations, Solomonoff introduced the following probability evaluation for
a given string [3]:

Here Ci are all the programs capable of producing
x and l(Ci) are their lengths. P(x) is called the Algorithmic
Probability of the finite string x. The sum is actually not-computable;
this is due to the fact that it is impossible to verify in finite times, whether
a particular string is a description of x or not. The best we can do
is to approximate Solomonoff equation by the largest lower bound of
P(x) ,that can be demonstrated in a definite time T.
This is usually done by finding a short code for x which gives
terms summing to that bound. It is evident that, at least formally, K(x)
represents the largest contribution to the above sum. So a reasonable
estimate is 2^(-K(x)).
[3]
Solomonoff R. J., (1964), A formal theory of inductive inference, I, II,
Inform. Control, 7, 1, 22,