Sunday, September 6, 2015

Design of the new solver

The inductive reasoning system described below is supposed to performed some functions of MOSES or CGC while being able to perform tasks these systems cannot do.
I doesn't use neural networks, genetic algorithms nor hill-climbing. It is meant to simulate the way humans induce rules.
It should universally find rules in input complex data. It is supposed to at least find any rules an average adult human would find. Therefore it should not make any limiting assumption (for example that order of elements in map matter or that it doesn't matter). It should be able to analyse data of different kinds: JSON structures, databases, DOM tree of HTML page, therefore wrappers need to be created which would return for each elements its value, components and connected elements.

I will present how it may be used to solve IQ test task and predict movement of objects. Enhancements required for learning board game and virtual worlds will be described later.

__HUMAN THINKING__

Suppose you are given this IQ test:

Example #1

[1, [3, 3]] -> [10, 7]
[2, [3, 4, -4]] -> [12, 1]
[0, [2, 3, 1, 1]] -> [6, 5]
[2, [0, 5]] -> ???
How would you solve it? What would your mind's algorithm look like?

Take the first row:
[1, [3, 3]] -> [10, 7]
Are some numbers equal? Yes, [3,3]. So maybe the first 2 elements in the subsequence are always equal? Looking down into the next entry... no it is not. Analyse the subsequence in consecutive entries. Its length is 2, 3, 4, 2 - it looks like 2,3,4,2,3,4,2,3,4 sequence but it may be coincidence. We don't know if position of entries matter - is it a sequence or set. Lets calculate other aggregate function on the subsequence - for example sum ;
[3,3] -> 9  
[3, 4, -4] -> 3
[2, 3, 1, 1] -> 7
Now compare these calculated values with the other, create expressions involving them. When comparing with the second element on the right side you may find out that :
sum([3, 3]) - 7 = 2 // [10, 7][1] == 7
sum([3, 4, -4])  - 1 = 2
sum([2, 3, 1, 1]) - 5 = 2

So
sum(key[1]) - value[1] = 2 
So
key[1] = sum(value[1]) - 2

Finally you would find that the rule is the following
[a, [b1, ..., bn]] -> [a+(b1*b2), b1+...+bn-2] 

The algorithm is described below.

__SERIES__

Induction is performed in series. Series is a sequence of elements, which are expected of being similar. AI finds the rules which apply to all or some regular subset of elements in series. Series analysis should cover all available data. Sub-series analysis will be also performed on sequences and maps which are contained in other elements.

Example of series are:
  1. Sequence of numbers in numeric IQ tests
  2. Sequence of pairs (state of board, next move) in a board game
  3. All elements on a particular board.
  4. Elements on the board in particular row, column, etc.
  5. Tables in database, rows of table and cells in row
Example #2 Table of records
t x y
1.0 2.0 4.0
1.5 3.0 5.5
3.0 6.0 10.0
4.5 9.0 14.5
This is data from movement of object without gravity with irregular time sampling (I left the numbers pretty regular to make reading the example easier)

Later we will analyse example #3 - movement of object with gravity and some wind (I assumed wind works like gravity but in different direction)
t x y
0.0 0.0 1.00
1.0 1.900 3.00
1.5 2.775 3.25
3.0 5.100 1.00
4.5 6.975 -5.75


__ELEMENT ANALYSIS__


CREATING ELEMENTS LIST


Each element in series is analysed. It is called master element. A list of nearby related elements is created. In details:
  1. For master element its components and components of components, etc. is returned.
  2. The same is made for elements related to the master elements.
  3. For each non-scalar element - experienced expressions (see Experience part below) are evaluated and if they are sensible in this context they are added to the list
  4. Interpretations are generated - to be explained another time. 
Complex elements and the master element itself is also located on the list. The generated list contains for each element
  1. a path leading to it from the master element (something like XPath language)
  2. complexity of the path (similar to Kolmogorov complexity)
  3. value of element
The size of list is limited: for each element analyses only n-least complex related elements are returned or elements up to given complexity limit.

Example #2

Analysed master element (the 1st row of input table) is the map: [x:1.0 y:2.0 z:4.0].  The list generated from it contains:
PATH  COMPLEXITY  VALUE
[this ,0,  [t:1.0 x:2.0 t:4.0] ] // the master element itself
[this.t, 1, 1.0] // operator "." is used to navigate to components
[.x, 1, 2.0] // "this" keyword is optional
[.y, 1, 4.0]
[this->next, 3, [t:1.5, x:3.0, y:5.5] ]// operator "->" is used to navigate to another element through a relation, thus "\next" is just the next row
[this->next.t, 4, 1.5]
[->next.x, 4, 3.0] 
[->next.y, 4, 5.5]

Example #1 

The first master element is
[1, [3, 3]] -> [10, 7]
so the list contains:
[this, 0, [1, [3, 3]] -> [10, 7]]
[.key, 1, [1, [3, 3]]] // "this" keyword is optional
[.value, 1, [10, 7]] 
[.key[0], 2, 1] 
[.key[1], 2,  [3, 3]] 
[.key[1][0], 3,  3]  
[.key[1][1], 3,  3] 
[.value[0], 2, 10] 
[.value[1], 2, 7] 


GENERATING EXPRESSIONS

Expression involving existing elements in list are generated and added to the list. Expression may be unary (not, sin), binary (a-b), tetrary (?). They may involve aggregate functions. Complexity of expression is generally sum of complexity of sub-expression plus complexity of the term (operator/function). It must be within certain limits to be added (such limit is necessary because for n basic elements c1*n + c2*n*n would be created where c1 is number of unary operators and c2 - number of binary operators) This process is repeated until all new expressions exceed complexity limit (perhaps limit should be greater for expression with more unique variables count)
Complexity reflects how non-obvious and non-symmetric is the expression for human mind. Wages should be configured or obtained through Experience (to be explained another time). The complexity cost rates in this example are the following:
  1. operator "." - 1 for each unique label of sub-element
  2. operator "->" - 3 for each unique label of next-element 1 for non-unique
  3. arithmetic operator - 1 for each

Example #1

(only the important elements are listed):
[sum(.key[1]), 2,  9]  // sum([3, 3])
[.value[1], 1,  7]
[sum(.key[1]) - .value[1], 4,  2]
The last expression evaluating on other master elements also return value 2

Example #2

Just some more elements. :
[t - ->next.t, 5, -0.5]
[x - ->next.x, 5, -1.0]
[(x - ->next.x)/(t- ->next.t), 8, 2.0]
We can see that the last element is speed on x as axis, but as for now AI doesn't know yet it is any special. Complexity 8 is pretty high but as there aren't many data - it should be reached. Perhaps there should be operator delta - delta(a) = a - ->next.a (or a - ->prev.a) with complexity 1 thus reducing complexity of the above term to 3 . Calculating difference with the analogical term in the next record is so common both in IQ tests and physics that it could be a bit privileged.

FINDING CONJECTURES

Now the conjectures are generated. The list is analysed: if two elements share the same value - their expression is taken to create identity conjecture. It states that for each master element in the series expressions #1 and #2 have the same value.

The newly filled list of elements is added to the multiset of values of all expressions evaluated during this series analyses. If they share the same value - a new identity conjecture is generated. The new conjecture is considered especially valuable if the expression is the same, just evaluated on the different master elements.

Example #2

Already generated expressions are evaluated for row#2:
[t - ->next.t, 5, -1.5]
[x - ->next.x, 5, -3.0]
[(x - ->next.x)/(t- ->next.t), 8, 2.0]

So in the multiset for key 2.0 there will be at least these 2 elements.
[row#1, (x - ->next.x)/(t- ->next.t), 8, 2.0]
[row#2, (x - ->next.x)/(t- ->next.t), 8, 2.0]
A new identity conjecture is created (x - ->next.x)/(t- ->next.t) == (->next.x - ->next->next.x)/(->next.t- ->next->next.t) It states that the speed in x direction is constant. Similar conjecture will by created for y

Example #1
Because sum(.key[1]) - .value[1] returns the same number for each master element - an equality conjecture for this and next master element is created.

TESTING CONJECTURES

Each conjecture is checked against other master elements. If it applies to all of them - it means that AI found a rule, great. If conjecture is only true for one master element and for other it is false- it is ignored. If it is true only for some - a detailed analyses is performed. Maybe it is true for each k master elements? Or is true always when another conjecture is true?


Example #1

Identity conjecture is true for entry#1 and entry#2. Evaluating on entry#3 leads to conclusion that in the last entry value[1] = 3
Perhaps another conjecture should be created stating that value of this expression is 2. If from the middle of the sequence value of sum(.key[1]) - .value[1] changed  to 3 then the first conjecture will be false only once while the second is false for half of records.

Example #2

The constant speed conjecture is true for the rows #1 and #2. For the other it cannot be evaluated.

EXPERIENCE

Gathering experience and applying conclusions from one task to another is a multi-aspect topic:
1. Initial complexity costs should be adjusted by how often particular part of expression was used in true conjecture.
2. Values of useful expression should be gathered in known sequences. If values of some new expression evaluated on consecutive master elements are found in some known sequence (for example Fibonacci sequence) - then a new conjecture is generated
3. If an expression is proved useful in one task then it is considered to have much less complexity when applying in another task

Example #3

Pretty complex expression  (x - ->next.x)/(t- ->next.t) was part of main conjecture in example #2. Therefore, during processing of master expression in example #3 for row #1, this expression is added with complexity 2 instead of original 8 to both elements this and this->next. Difference between them divided by time difference express acceleration.

__REMARKS__

1. Expressions are considered related if they are equal, but this should be generalized. One value may be sub-string of another. One may be a category for another or they may share the same category.
2. Language for expression should contain relations and for-all, there-exists quantifiers. Expressions like "sum all elements in sequence such there there exists value in another sequence greater than the given element" should be possible to write down. How is AI supposed to find such a rule is a different issue.

No comments:

Post a Comment