Monday, September 14, 2015

Generating knowledge



This post is about how to make an AI, which is able to generate knowledge for OpenCog, NARS, SOAR or other AGI engine. If human is given some unknown data to analyse, he/she will find some regularities. If the data is asequence of states in some game, and sufficient time is given, human would induce its rule. If a human is given a few examples about transforming one complex data structure into another he/she will find the rule. All these tasks should be possible in IQSolver.

IQSolver will generate knowledge in the form of equations involving mathematical operators, relations (<, >=, etc.), boolean operators and quantifiers. First order logic should be enough, although maybe I haven't found appropriate example that would falsify this conjecture :). The process of designing this inductive engine is also inductive: from invented examples I induce what is necessary, so feedback and counterexamples would be appreciated :)

TESTING CONJECTURES


IQSolver generates conjectures about data. Some of them are justified by 1 observation (entries), others by many. This is similar to truth value in PLN. For example in the task of filling the sequence 1,2 - the conjecture that difference between elements is 1 - is justified by one example. And there are other conjectures, for example that the quotient is 2. On the other hand, in the sequence 1,2,3,4,5,6 -the difference-1-conjecture is justified by 5 observation, and all others are falsified. 

Some of conjectures are true for all objects they may be evaluated on. They are the most valuable, as they allow to predict or verify unknown values. However, when a conjecture is true only for some elements - it is not automatically rejected but a detailed search is done. For example when analyzing chess movements AI may conclude that always one piece moves on a space that is empty or occupied by another side's piece (building a concept of side and assigning piece symbols to it is another issue). However, this conjecture does not hold for castling, promotion or en passant. Thus, AI should make detailed analyses in this cases but do not reject the general rule. 

So what should be done if q is not always true? Maybe for conjecture q there is another conjecture p so that always p => q ? If so, a new complex conjecture will be created stating that p => q. But this conjecture is not full as it implies nothing if p is false. Perhaps there is another conjecture r such that ~p => r. Then a new conjecture would be p=>q AND ~p=>r

An implications set may be created. For example, if you analyse movements of chess pieces then you cannot make one rule for all. But it is possible to enumerate rules for each piece:
IF piece_kind == "rook" THEN x1 == x2 or y1 == y2
AND IF piece_kind == "bishop" THEN x1-y1 == x2-y2 or x1+y1==x2-y2
AND IF piece_kind == "knight" etc
Sure, simple conjecture with less elements carring the same information is better. But, if simple conjecture cannot be found, then it is valuable to have complex conjecture consisting of many implications so that always one and only one premise is true.

CONJECTURE FEATURES

To summerise, for conjecture the following numbers are held:
  1. For how many observations (elements in series) it was tested on
  2. How many times it could be evaluated (for example you cannot evaluate link prev on the first element as it has no previous element, you cannot evaluate piece_kind=="rook" if there is no piece_kind field on an object)
  3. How many times it was true
  4. If a conjecture is in a form of implication or series of implication - how many times any premise was true
  5. If conjecture is a series of implication - how many times many premises was true. This indicator is rather negative because it means the classification is not disjunctive


GENERATING SIMPLE CONJECTURES

Just to summarize what I wrote in the previous post. Simple conjectures are generated in the process of iterating over elements in series. For each element a number of formulas involving this, related elements and sub-elements are generated. Formulas above some complexity level are not generated. Formulas useful earlier has much reduced complexity.
If value of formula equals value of another formula evaluated for this and other elements then a new simple equality conjecture is created. If value of formula evaluated for consecutive element equals some earlier experienced sequence - a conjecture is also generated

COVERING CONJECTURES

Some conjectures may be always more informative than the others. For example p <=> q holds more information than p => q. However, conjecture p <=> q is created when p => q and p <= q are found. Base p=>q and p <= q conjectures are then covered. They won't be used in generating knowledge to external system nor shown as conclusion to the user. However, it may happen when analyzing more data (as will be shown later in the example) that p<=>q is true for less number of cases then p => q and then both of them are shown. The former is better because it is more informative, the latter is more often true.

__EXAMPLE__

CONSTANT SPEED

Take this sequence of 5 maps given by this table (for example the first element is a map [t:0, x:1, f:5] )
t  x  f
0  1  5
2  5  5
3  7  5
3.5  8  5
5  11  5

Many conjectures will be generated, among them:
x = t*2+1
f = 5
f = prev.f
but also
d(x) / d(t) = 2 
d(x) / d(t) = d(prev.x) / d(prev.t) 

where d(x) is an operator meaning x - prev.x (defining it as x-next.x as in the previous post would ultimately lead to the same results, but would be less intuitive. I also resigned from marking links by -> operator)

As you may have guessed - this is sampling of constant-velocity movement of an object. t is time, x is position,  d(x) / d(t) is speed.

CHANGING SPEED

Now new data become available:

t  x  f
6  12  4
7  12  3
9  10  2
10  9  2
11  8  2

Conjectures x = t*2+1 and d(x) / d(t) = 2  are false for new data. So a new conjecture will be generated  x = t*2+1 <=> d(x) / d(t) = 2 ,which is true for all elements

Conjectures f = prev.f and d(x) / d(t) = d(prev.x) / d(prev.t)  are both false for the first 3 of new records but true for the last 2. So a new conjecture is created
f = prev.f <=> d(x) / d(t) = d(prev.x) / d(prev.t)
As you may guess f stands for fuel, which is used when speed is changed. For us it sounds sensible - to change velocity in the outer space a space ship needs fuel. IQSolver would not interpret it but would find these rules and send them to knowledge engine.

REFUELING

Views about the worlds change with experience and the same happens to IQSolver. The next portion of data is 

t  x  f
16  3  2
19  0  2
20  0  1
21  0  2
30  0  11

In the last 2 entries
d(x) / d(t) = d(prev.x) / d(prev.t) but not f = prev.f 

So the equivalence
f = prev.f <=> d(x) / d(t) = d(prev.x) / d(prev.t)
is not longer true. However the previously covered conjecture now emerges to the surfuce 
f = prev.f  => d(x) / d(t) = d(prev.x) / d(prev.t)

But AI want's to find equivalence rules. So it tries to find the rule for elements for which  not (f = prev.f) and d(x) / d(t) = d(prev.x) / d(prev.t). This requires work of another submodule capable finding a conjecture which is true for some elements and false for the others. It is similar to classification problem. After the search a conjectures x==0 and d(f)=1 are generated.

The rule f = prev.f or x==0 <=> d(x) / d(t) = d(prev.x) / d(prev.t)
is not always true (for example for the third entry [t:20, x:0, f:1]), however
f = prev.f or x==0 <= d(x) / d(t) = d(prev.x) / d(prev.t)



is always true and material equivalence is true
f = prev.f or d(f) = 1 <=> d(x) / d(t) = d(prev.x) / d(prev.t)

SUMMARY 

In this post I have presented in details how very grounded and low level knowledge may be generated on data AI can make no assumptions about. Therefore, I think that such module would be useful for many other AI systems. 

No comments:

Post a Comment