Fractal Classification

 

 

The concept of control implies the existence of at least two subsystem: the controller C1, and controlled one:  C2.  For implementing a control process, there must exists some form f interaction between C1 and C2. This interaction is not symmetric: the controller C1 can affect the state of C2.

In the classical control theory it is assumed that the controlled system cannot affect the controller. We assume that C1 operates on the basis of some monitoring system M1 that produces an “image” S1 of the environment. S1 can generate, via C1, a control message m12(S1) to C2. Then C2 is possibly modified by such a message.

If we imagine a temporal change of the external environment U1, U2, U3,………UN, it will give rise to a succession of images S11, S12, S13,…….S1N.

C2 will “see” a succession of messages m12(S1i) from C1.

The average information per symbol for C1 will be:

 

                                                                                                                                                   (1)

 

For C2, if no other inputs than m12(S1i) are received, the corresponding entropy will be:

 

                                                                                                                                          (2)

 

where the sum is taken on all possible value of the messages.

Even if we assume that to each state of affairs corresponds a different message, C2 will receive the same amount of information C1 receives.

Assume C1 is a classifier, meaning that the S1i are transformed by C1 into a finite (say 2) classes. The information C2 will get will be the statistics on such two classes.

 

Assume now that C2 has its own monitoring system M2 and that, for the moment, M1=M2 in such a way that C1 and C2 receive exactly the same input strings. The only difference is that C2 receives C1 messages as well.

The C2 module has then access to the same input of C1 plus its classification results. Let us suppose that the classification is 0 or 1.

Then C2 receives one bit more.  We have:

 

   for each value of the classes  c1 of C1.                                            (3)

 

From this point of view, the characteristics of the probability distribution within the C1 classes define the entropy. Then:

 

                                                                                                          (4)

 

The following relation holds:

 

                                                                                                                                                (5)

 

So the excess of entropy that affects H2 is measured by the “classification entropy” CH represented by the second term on the rhs of (5).

As the classification entropy is given by:

 

                                                                                                                     (6)

 

H2 is in any case greater than H1 except for null values of CH.

From the above analysis we conclude that, to take some advantage out of C1 classification, C2 cannot simply receive, together with message from M2 (remember we have assumed here M1=M2), the results of C1 classification. So the control does not is capable of creating any advantage for the controlled module by simply “passing” its classification results.  It must therefore affect the controlled module in some different way. Equation (4) is an interesting alternative as far as C2 is “capable” of switching its structure in such a way that it (or its monitoring module) “sees” just the data string S such that they belong to the class identified by the message received by C1. In this case C2 should in principle carry out a further, more detailed classification of the strings for each class identified by C1. It has to analyse the structure of (3) for each C1 classification class. Actually it is asked to simulate two modules by switching to different states depending on the C1 classification message.

We then consider the evolution of the classification tree implemented by following this structure.

 

 

preliminary fig.1

 

 

Here it is assumed that each module has the same classification power and, to make things simple, that each of them implements a binary classification. The first characteristic that one can identify is that all modules are to be connected with the environment.  One can observe also that, to be able to implement a N-classes analysis of the input data, you need N-1 modules for the binary classification structure. This seems a very non-economic structure for knowledge refining.

 

We consider in the following a completely different structure with the following characteristics:

 

1)     New modules are made available in time with a given frequency depending on an external mechanism

2)     The “occupation” of a new available module depends on some form of competition between existing modules

3)     Each modules compete making use of its more frequent classification signal

4)     The module which has the highest frequency has the largest probability of occupying the new module

 

At a given instant T modules exist and each of them launch a signal belonging to a set of I terms.

The probability that a given signal j is launched by  the ith module is:

 

                                                                                                                                                            (7)

 

W(xij) is the set of S that correspond to the j class of the module i.

The rhs of (7) depends on both the cardinality of W(xij) and the values of P(S) on it.

No hierarchical layered order is assumed between the modules.

Equation (7) implies that the access to other modules, in order to further refine the original classification, is a purely statistical process, where those classifications, characterised by higher values of the coefficient (7), tend to grow into more complex branching.

To make clear the above concept, let us consider a simple case.

Assume that we have, at time=1, a mother module1  implementing concepts  1a and 1b.

Assume further that:

 

P(1a)=0.8 , P(1b)=0.2

 

With the highest probability 1a will occupy the next free module. Therefore  1a develops into 1aa and 1ab by the action of the occupied module.

This will put into competition 3 classification messages:

 

[1aa, P(1aa)];  [ 1ab,P(1ab)]; [ 1b, P(1b)]

 

with their respective probabilities. Of course other developments are statistically possible with lower probabilities.

The above mechanism has several peculiar implications. A classification refinement has a better probability of giving rise to further development as far as the probabilities of the two classes it produces are characterised by different probabilities as defined by (7). In facts a concept 1a giving rise to a symmetric (in terms of probabilities) tree, will rapidly waste the original probability by successive bifurcations:

 

[1a, P(1a)] -> [1aa, P(1a)/2]; [1ab, P(1a)/2]; [1b, (1-P(1a)]

 

It is evident that, at some order of symmetric branching, 1a concept will be statistically stopped by 1b.

 

At a given time t, t+1 modules will exist. If each module is implementing a binary classification,  2(t+1) binary classifications will also be competing for the next module. We call “boundary of the classification system” B(t), the set of the classifications competing for the next module.

The number of elements of  B(t) is t+1 ad grows linearly with time.  Therefore expression (7) needs to be normalised:

 

 

                                                                                                                                                         (8)

 

In the above model each unsaturated class, appeared at time t, competes for new modules for each successive time. A consequence of this assumption is that, due to the increasing spreading of the growth probability on a linearly growing B(t), also very poor “old” classifications will tend to occupy some new module at  some time in the future.

This is a quite unsatisfying aspect, as the final result of such mechanism is the evolution of poor old classifications just because of the global refining of the classification system.

To bypass this problem, we can simply extend the interpretation of the boundary function P. More precisely we can introduce a “blocking probability” for each element of B(t), inversely proportional to the P value of the element.

To make the model “more general” one can insert some tuning parameters. We can sum up the above discussion by the following hypotheses:

 

The probability of a classification, belonging to B(t), to occupy the next module  is given by:

 

 

                                                                                                                                         (9)

 

 

The “death” probability for the same elements is given by:

 

 

                                                                                                                                             (10)

 

Here g, d, g, d are positive tuning parameters to be discussed below.

Alternatively one could introduce an exponential decrease of competing signals.

In figure 2 and 3 a schematic representation of a possible evolution process is represented. Figure2 shows the evolution of modules occupation, while Figure 3 shows the resulting classification.

 

 

Figure 2: Modules appear in time. They are so ordered. The full arrows represent the occupation as it happens in time. Dashed arrows represent the message coming from the unique monitoring module.

 

 

 

Figure 3: The classification resulting from the evolution of Figure 2. Red arrows represent the blocked classifications at time equal 8. B(t=8) is given by [C2, C11, C122, C12122, C12112, C121211, C121212, C121111, C121112]

 

 

 

If we assume that all new modules are operating on the same input module M, in such a way that all the strings available are those belonging to the same set of binary strings with a given fixed length N, just 2N strings will be the object of the evolving classification. Therefore at T=2N no further classifications will be possible as each string will belong to its own class. This is a theoretic limit, as the introduction of a death probability of classes will in general block the process of evolutionary classification before it reaches the above limit.

This is the obvious consequence of our assumptions on the monitoring module M. The above model is based on the hypothesis that, while the knowledge system evolves, M does not participate to the evolution process and it passes the same information to all the modules appearing successively in the knowledge classification system.  Evolution of M would imply that most recent modules of the knowledge classification system would receive from the external environment more detailed information. A major problem is well represented by the following question: “how evolution could affect the growth of M? “ The available signals are the further classification requests of already developed classifying modules. Each of them, following equation (7), emits a signal that is proportional to the relative importance of its support data set in terms of probability. Each class

 

Back to Interests

 

Home Page