Memory dynamics and reversible computations

   

Computers disperse heat because of their physical nature. The technological evolution since the forties has greatly decreased such a dissipation, which was 0.01 joules per elementary operations then and it is of the order of 10^-17 joules per elementary operation now.

J. von Neumann concluded that a computer, operating at the absolute temperature T, dissipates at least KTln2 Joules ( K=Kelvin Constant, T = absolute temperature, ln2=natural logarithm of 2) per bit operation. Launder found that energy dissipation is associated with logically irreversible operations and that  von Neumann’s conclusions is correct just in that case.Bennett has further clarified the concept of reversible computing. For Launder a logical operation is logically reversible if its inputs can always be deduced from the outputs. So if you add 3 and 2, take a note of the result and cancel “3+2” you perform an irreversible logical operation. The computer of course dissipates heat, and this has nothing to do with the above observations. The dissipation we are talking about is in some sense the extra dissipation due to the nature of the global specific process the computer is running.

Consider a prototype computation

 

X,p->T->Y,Q  where Q is what we normally call GARBAGE

 

If Q is irreversibly deleted, we cannot invert the computation [Y,Q->T->X,p] and will have a irreversible cost of the computation given by KTln2 times the bits contained in Q. It’s evident that the amount of memory available to the computer plays a major role in allowingirreversible operations.

A computation is normally made of reversible and irreversible parts. The number of irreversibly erased bits times KTln2 is the irreversibility cost of the computation.

What seems interesting here with respect to the brain’s  investigations, is that irreversibility cost corresponds to the amount of information lost about how the computation took place or how a given final result was obtained.

 

If the computation cannot be inverted, we cannot reconstruct X. Actually if the program p is stored somewhere, then the computer can recall p itself, but still, in the absence of Q, X cannot be reconstructed. One could assume that p is such that there exists a program p-1, totally defined by p, with the property that:

 

IF X,p->T->Y then there exists a Turing Machine T-1: Y->T-1->X

We need to be more formal at this stage. Assume we are given a UTM, a program p and a data string x.

 

 

We assume there exists:

 

 

Now we must ask ourselves if the computation by T-1 is reversible and the UTM calculations are irreversible as well. The problem looks somehow diverging. This does not mean that there do not exist invertible Turing Machine. It simply suggests that the conditions are quite strict, and that it is realistic to assume that most of the going-on computations in this world have irreversible parts.

This is important, as the irreversible entropy cost becomes somehow a measure of the lost knowledge about how the system has reached its conclusions. Irreversible cancelling is therefore evidently associated with the knowledge about the programs run if stored programs cannot be accessed. Actually inaccessibility is not needed and can be substituted by less strict conditions. One must be careful.

Consider a finite ensemble of Turing Machines S. Such a set can also be considered as the ensemble of the simulation programs pi that must give to a UTM to simulate Ti. Assume further that each Turing machine is given a name qi. We can well imagine to know the set of the names, and to completely ignoring the programs run by them.

This seems a potent  limit of some special form of autobiography of the system. Keeping track of the story, and keeping track on why the story evolved in such a way are very different kinds of autobiography.

Take living being with a knowledge system, which is moving in a two-dimensional discrete space. A kind of autobiography (environmental descriptive biography) is the succession of its environmental states xi =(xi,yi). The system can have an internal state (si) and register its internal biography. If however, the algorithms run have an irreversible part, this measures a deficit of its capability of reconstructing its algorithmic story (that we can call Algorithmic Autobiography).

 

Environmental Biography (EB)

 

It is described by: ,and its dynamics is described by  the transition matrix P(i,j). It’s a Markov Process, and corresponds to “when I was there, after having made a given trajectory, I moved there with a given probability” In this case the set of environmental position data has to be stored somewhere.

 

Internal Biography (IB)

Corresponds to “after a given internal evolution, I moved to that state” . There is a minimal biographic competence.

 

EB+IB

 It’s again a Markov process in more dimensions. It corresponds approximately to “I was there and felt like that” or, maybe “I was there and had these ideas, then I moved there”“I was there and was in this state; so I moved there” To the question :”why did you move to the next place?” can be given just on the base of the available information. So the answer is  A(i, i+1)=(xi,si)->(xi+1,si+1). The linguistic form can be whatever one prefers, but it can contain just the available information.If we order (not in time) internal states and environmental position, we can build a Markov Matrix and consider long or short stories. The Markov Matrix is the only interpretation tool for stories. This means that the “explanation” of the agent behaviour can come just from various orders of Markov Matrices. The agent has stored just the information needed to pose its decision rules as a problem.  Again the memory corresponds to an ordered data set.

 

Algorithmic Biography

The algorithmic biography is identified with the capability of inverting the computations that lead the agent from the beginning of the trajectory to its final point.

In this case there will be nothing mysterious or unknown in such a trajectory. This is a quite theoretical concept that needs further discussions. In any case, reversible parts of the global program can be identified with some form of memory accessible without storing of data.

Assume that the computation takes place by just one Turing Machine TG.

 

The time evolution of the computation looks like:

 

x1, s1->T->x2,s2+q1->……..

 

So the set of [qi ] must be stored in a database to make the calculation fully reversible. The idea of not throw away [qi ] in order to make calculation reversible has been proposed by Bennett. Of course not all the parts of a computation need to be reversible. So one can investigate to possibility of making a computation reversible just in particular parts of it.

Some points need to be clarified as, in some cases the interpretation of the results by Launder and Bennett can be complex. The first point is that the erasing of purely random data strings is a thermodynamically reversible process; un this case in facts the entropy increase of the environment is compensated by an equivalent entropy decrease of data. From the point of view of the brain it seems very appealing to consider the idea of cancelling all random strings.

This fact by itself poses some interesting problems. Assume to start with a string x, where the superposition between a non-random component and a random one is evident. On this basis, we build a program p, capable of decomposing x into xd (deterministic) and xr (random) , such that

    

x=xd+xr

 

The process is formally the following :

 

            <x,p>->T->xd

            x-xd=xr

 

If xr is cancelled, it is impossible to compute x back. On the other hand, if xd is actually random, the cost of erasing it is null.

 

Computing schemes can therefore be schematised in the following way.

 

1)      INPUT STORING :Computing(i): store input, evaluate, pass output

2)      OUTPUT STORING: Computing(i), evaluate, store output, pass output

3)      ZERO STORING: Computing(i), evaluate, pass output

 

In a long series of calculations, schemes (1) and (2) are essentially equivalent.

As a matter of fact, the storing of intermediate results is essential, as noticed by Bennett.

But, whenever the garbage is a random string, the above statement implies that the cancellation is at no thermodynamic cost. So anytime the result of a computation is the extraction of a random noise, the erasure of it has no thermodynamic cost.

In terms of complexity theory, if K(x)=l(x), then x can be erased without any thermodynamic  irreversible effect. If we refer to program 2 however, we see that as X is not a random string, its cancellation has a thermodynamic cost.

One would be lead to the conclusion that, if two strings A and B have to be cancelled, it’s thermodynamically convenient cancel the “more random one” as this would imply a lower thermodynamic loss.

A measure of randomness is evidently. If K(x)=l(x) then this quantity is infinite.

 

 

 

So, if the guess is correct, the best choice could be that of cancelling high complexity strings.

Which could be read as “if the string is almost random, cancel it”

 

Unfortunately absolute evaluation of K(x) cannot be transformed into a definite algorithm.

For the moment p is rather undefined.

 

Assume that p is such that it gives an estimate of effective complexity. So its output is an estimate  of  KE(X).

We have:

 

K(x)=< l(X)

KE(X)=< K(X)

 

 

So if we put a limit on the second quantity, we know that the first is even larger. Maybe some normalisation is needed. Anyhow the limit of  KE(X) is represented by its dependence on the specific language. As it intuitively coincides with the capability of the system to identify regularities in the string, which identify a class, whose algorithmic definition is KE(X) itself, the language needs to be defined.

Another point, underlined by the definition of logical depth, is how much the system takes to evaluate it.

 

We can try the following

How can T evaluate a reasonable approximation? Do we need further analysis?

 

x->T->y

Assume it is given n time steps to produce its output and then halt. Assume it does not stop at time step n. This means that its utility within n time intervals is zero with respect to the input string. One could represent it in terms of accepting or not x. In some sense T does not recognize x in n time steps, meaning by that the impossibility of outputting a result and then stop. Because of the halting problem, this cannot be defined before. Anyhow the utility of T for a living system implies that the output (whatever it is) must be delivered in a finite number of time steps.

So we come to the conclusion that the only reasonable possibility is that of defining a limit in the time of computing.

 

Given a Turing Machine T, one can identify two classes of input finite strings of a given length l.

 

On and O>n

 

meaning that T does not halt within n time steps.

 

We can say that O>n represents the ensemble of the l-strings that cannot be n-interpreted by a given T. This is evidently a statement about the relationship between a string and a Turing Machine. So, more properly, we can write O>n (l,T). It can also be considered as the set of strings that are too hard to produce an output and the halting of T. We can normally fix l, and cancel it since now.

 

One could define a language over a finite alphabet A, A* (for example a subset of all the possible words of length l), such that T makes at most t(n) steps before it prints the output and halt. If we refer to a binary alphabet, the number of strings of a given length l is 2l, and, as usual, their K is in most cases of the order of l.

It must be noticed that for low values of l, the concept of complexity fades away in some sense. If one considers the first t attempts of identifying true random strings, one can see it very clearly.

The concept of Oracle machine is evidently call into play.

Let us remind that a Turing Machine with on oracle TO, is a normal T, except that, after it has computed a string x, it can enter in a special state (oracular) and verify if the computed string belongs or not to a given language A*. It gives a positive answer “1” if this is the case, “0” in the opposite case.

The TO can in principle enter the oracular state at any step and, in particular before starting, and after having ended its task. So a TO can verify if its input belongs to A* and start the calculation just if its input belongs to A* itself.

So the above machine can be seen as a TO with a limitation in time for what regards its oracular computation. In its oracular state it must accepts the elements of A* in less than t(n) steps.

Referring specifically to the mind of a living being, which needs to take decision in a finite time, the class O>n(T) represents the set of incomprehensible environmental situations, that cannot be usefully elaborated by T.

So input strings which are rejected by the TO machine when in its oracular state are operationally to be trashed. There seem to be no doubts about this conclusions, when survival, and therefore timing is an important parameter. The tentative analogy between memory storing and trashing dynamics, and computational reversibility inevitably take us to the relationship between languages and randomness. It must be pointed out that languages here refer to the representation of the environment by some see(S) function, that is “images” of the environment obtained by some evolutionary process of the monitoring system.

 

 

Next Page

 

 

Back to Interests