Data Structures (Cont)

Association Lists

We can think of association lists as database tables (with a single-field key, though this restriction can be relaxed)

peter

5551616

kim

5559282

nicole

5559282

can be represented as a list as

[ p(peter, 5551616),

p(kim, 5559282),

p(nicole, 5559282)].

We can use the "member" predicate we defined last week to look up elements of the list

member(X, [X|_]).

member(X, [_|Ys]) :-

member(X,Ys).

For example

member(p(kim,Ph),

[ p(peter, 5551616),

p(kim, 5559282),

p(nicole, 5559282)]).

will return

Ph = 5559282

In order to look more like a database, we define some "helper" predicates:

%lookup(Alist, Key, Info) if p(Key, Info) occurs

%in Alist

lookup(Alist, Key, Info) :-

member(p(Key, Info), Alist).

%empty_alist(Alist) if Alist is empty

empty_alist([]).

%addkey(Alist0, Key, Info, Alist) if Alist is

%Alist0 with p(Key, Info) added on the front

addkey(Alist0, Key, Info, Alist) :-

Alist = [p(Key, Info)| Alist0].

%delkey(Alist0,Key,Alist) if Alist is Alist0

%with the info corresponding to Key deleted

delkey([],_,[]). %Deletion from empty list

%gives the empty list back

delkey([p(Key,_)|Alist], Key, Alist).

delkey([p(Key0, I0)|A0, Key, [p(Key, I0)|A]) :-

delkey(A0, Key, A).

%modkey(Alist0, Key, Info, Alist) if Alist is

%Alist0 with the information for Key updated to

%Info

modkey(Alist0, Key, Info, Alist) :-

delkey(Alist0, Key, Alist1),

addkey(Alist1, Key, Info, Alist).

 

Dependency Graphs

We can in turn use association lists to represent dependency graphs:

[ p(foundations, []),

p(int_walls, [foundations]),

p(chimney, [foundations]),

p(ext_walls, [foundations]),

p(roof, [ext_walls]),

p(windows, [ext_walls]),

p(tiles, [chimney, roof]),

p(doors, [int_walls])].

One of the things we might then wish to discover is what items must precede tiling:

predecessors(N, Alist, Pre) :-

lookup(Alist, N, NImPre),

list_predecessors(NImPre, Alist, ListPre),

list_append([NImPre|ListPre],Pre).

list_predecessors([],_,[]).

list_predecessors([N|Ns],Alist,[NPre,NPres]) :-

predecessors(N, Alist, NPre),

list_predecessors(Ns, Alist, NPres).

list_append([],[]).

list_append([L|Ls],All) :-

list_append(Ls, AppendedLs),

append(L, AppendedLs, All).

Unfortunately, this will give us repetitions: if House is the list above,

predecessors(tiles, House, Pre).

will give the answer

Pre=[chimney,roof,foundations,ext_walls,

foundations]

We can avoid this by keeping track of which members of the predecessors list we have already seen:

predecessors(N, Alist, Pre0, Pre) :-

lookup(Alist, N, NImPre),

cumul_predecessors(NImPre,Alist,Pre0,Pre).

cumul_predecessors([], _, Pre, Pre).

cumul_predecessors([N|Ns], Alist, Pre0, Pre) :-

member(N, Pre0),

cumul_predecessors(Ns, Alist, Pre0, Pre).

cumul_predecessors([N|Ns], Alist, Pre0, Pre) :-

not_member(N, Pre0),

predecessors(N, Alist, [N|Pre0], Pre1),

cumul_predecessors(Ns, Alist, Pre1, Pre).

Now we can do

predecessors(tiles, House,[], Pre).

to get the answer

Pre=[ext_walls,roof,foundations,chimney]

Binary Search Trees

CLP systems can represent a wide range of dynamic data structures, of which lists are only one example

Another important example is the binary search tree:

An important assumption is made about storage of information in binary search trees: the information is ordered in some way (for example, alphabetical order), and the information is stored in such a way that the root node of any subtree falls between all the information on the left branch, and all the information on the right branch

For example suppose we have a predicate less_than which orders names in alphabetical order (we could write a general rule which would do this, but for the example, we'll just use some simple facts):

less_than(harald, kim).

less_than(harald, peter).

less_than(peter, kim).

%traverse(Tree,List) if List represents the

%information resulting from an in-order

%traversal of Tree

traverse(null, []).

traverse(node(T1, I, T2), L) :-

traverse(T1, L1),

traverse(T2, L2),

append(L1, [I|L2]).

 

%find(Tree, Item) if Item is in Tree

find(node(_TL, Item, _TR, Item).

find(node(TL, I, _TR), Item) :-

less_than(Item, I),

find(TL, Item).

find(node(_TL, I, TR), Item) :-

less_than(I, Item),

find(TR, Item).

%insert(Tree, Item, New_Tree) inserts Item into

%Tree in such a way as to preserve the ordering

%(Tree is unchanged if Item is already there)

insert(null, Item, node(null, Item, null)).

insert(node(TL,Item,TR),Item,node(TL,Item, TR).

insert(node(TL,Item,TR),Item,node(NTL,Item,TR):-

less_than(Item, I),

insert(TL, Item, NTL).

insert(node(TL,Item,TR),Item,node(TL,Item,NTR):-

less_than(I, Item),

insert(TR, Item, NTR).

Binary search trees have many uses; for example, if the stored item consists of a key and some information, they can be used to implement association lists in an alternative manner:

%less_than(Item1, Item2) for two association items

%if their keys are so related

less_than(p(K1, _), p(K2, _)) :-

less_than(K1, K2).

lookup(Alist, Key, Info) :-

find(Alist, p(Key, Info)).

empty_alist(null).

addkey(Key, Info, Alist0, Alist) :-

insert(Alist0, p(Key, Info), Alist).

The advantage is that the search time is logarithmic, rather than linear, in the number of items in the association list

The cost is that there must be an order relation available for the keys

Hierarchical Modelling

We have effectively covered this in assignment 6

Tree Layout

We will omit this topic from the course

 

Controlling Search

Recall that CLP systems search their derivation tree from the top left hand side

If there is an infinite branch to the left of the desired goal, the search won't terminate

Even a large but finite branch to the left of our goal is undesirable (though it sometimes cannot be avoided)

So far, we have largely ignored the order in which we write our constraints (on the basis that the order of constraints does not affect the solutions)

But the order of constraints does very dramatically affect the search strategy (or to put it another way, rotates the tree so we search it in a different order)

We will be looking at ways to

in order to improve search efficiency

Some of these improvements are generic - they will speed up the search for any goal; others are specific - they will speed up some goals at the cost of others, so we will need to know ahead of time how the constraints are to be used

We will also look at techniques by which the system may be told to ignore whole branches of the search tree

The latter techniques rely on the assumption that we can know ahead of time that the goal will not lie in the part of the search tree we have pruned

That implies a risk: we might be wrong (so a correctly constrained problem might give wrong answers)

In particular, we might not know ahead of time all the different ways in which users might use the constraints (which affects where the goal will lie)

Estimating Efficiency

A mode of usage for a predicate p is a description of the way in which the arguments of calls to p will be constrained at the time when calls to p are evaluated

For example, consider

sumlist([],0).

sumlist([N|L], N+S) :-

sumlist(L,S).

with the mode where the first argument is a list of fixed length (ie the list is sufficiently instantiated that the number of elements in it is known), but the second argument is free

The goal sumlist([1],S) fits this mode, as does the goal L=[1,2], S>Z, sumlist(L,S)

The goals sumlist(L,2) and S>3, sumlist(L,S), L=[1,2] do not satisfy the mode

In this mode, the derivation tree for sumlist has 5|L|+6 nodes, where |L| is the length of the list L

So for this mode, the program is acceptable (linear time algorithms are usually acceptable; in general, we are prepared to accept polynomial algorithms, but exponential or higher algorithms are unacceptable)

On the other hand, for the mode where the second argument is known but the first argument is free, the size of the derivation tree is infinite, indicating that the program is not appropriate for this usage

It is usual to include the intended mode of usage in the comment heading a program:

%sumlist(L,S) if S is the sum of the fixed

%size list L

sumlist([],0).

sumlist([N|L], N+S) :-

sumlist(L,S).

 

Controlling Search

Suppose we wish to write a predicate to sum the numbers from 1 to N (for the sake of the example, assume multiplication is not available, ie sum(N,(N*(N-1)/2))) is not an acceptable program)

sum(N, S+N) :-

sum(N-1, S).

sum(0,0).

Even with simple goals such as sum(1,1), this goes into an infinite loop using the first clause - the second clause is never checked, because the first clause never actually fails

A better version reorders the rules:

sum(0,0).

sum(N, S+N) :-

sum(N-1, S).

This leads to a couple of useful slogans:

Do simple things first

Fail as early as possible

 

 

But now suppose we try an incorrect goal, eg sum(1,0); again, we get an infinite loop, instead of a simple "no" - it arises because the second clause can still recurse through all possible negative numbers

Obviously we just need to add an extra constraint to the rule, requiring N to be positive (if N is negative, then sum(N,S) should fail because no such S exists):

sum(0,0).

sum(N, S+N) :-

sum(N-1, S),

N>=1.

Unfortunately, this still loops infinitely: the constraint needs to come before the recursion to avoid looping

sum(0,0).

sum(N, S+N) :-

N >= 1,

sum(N-1, S).

 

Literal Ordering

What we're talking about here is the final reordering above

Primitive Constraints

The rule is to place primitive constraints at the first place in the rule body where they might cause failure

Putting them any later might allow unnecessary search before the constraint is evaluated

Putting them earlier just adds to the number of constraints which the system has to continually re-check

 

User-Defined Constraints

The handling of user-defined constraints relies on the concept of determinism:

Determinism

A (simplified) derivation tree is deterministic if it is finite and each node has at most one non-failed child (that is, every side track fails immediately, and there is either no success node, or the path to the success node is direct)

For example our "sumlist" is deterministic for the mode in which the first argument is a fixed list

The first three versions of "sum" are not deterministic for the mode in which the first argument is fixed; the final version is deterministic for this mode

Determinism and Literal Ordering

Deterministic predicates

so should be placed as early as possible

However the mode of a goal affects its determinism; but the position of a goal in a clause may affect its determinism (ie moving a goal forward may change a deterministic goal to a non-deterministic one)

So the general rule is to move deterministic goals as far forward as is possible without losing their determinism:

 

Examples

father(jim, edward).

father(jim,maggy).

father(edward,peter).

father(edward,helen).

father(edward,kitty).

father(bill,fi).

mother(maggy,fi).

mother(fi,lillian).

grandfather(Z,X) :-

father(Z,Y),

father(Y,X).

grandfather(Z,X) :-

father(Z,Y),

mother(Y,X).

If we intend to use this to find the grandfathers of given people (ie the second argument is fixed), the first literal in each of these clauses is non-deterministic and the second is deterministic

We would be better off with a reordering:

grandfather(Z,X) :-

father(Y,X),

father(Z,Y).

grandfather(Z,X) :-

mother(Y,X),

father(Z,Y).

Note that in the re-ordered rules, the second literals have become deterministic - but we cannot move them back again, because then they become non-deterministic

 

Redundant Constraints

A redundant constraint is one which can be removed from a goal without changing any answers

Answer Redundancy

An answer-redundant constraint is one which is redundant with respect to the rest of the literals in the body of a rule

The addition of N>=1 to 'sum' was an example

For another, suppose we wish to extend 'sum' to the mode where the sum is fixed, but the number to be summed is unknown

The current version will work for values of S for which there is such an N, but will loop infinitely for other values of S

This can be fixed by further answer redundancy:

sum(0,0).

sum(N, S+N) :-

N >= 1,

S >= 0,

sum(N-1, S).

The new literal is answer-redundant (all successful computations must generate a positive S), but dramatically improves the termination characteristics of sum

 

Solver Redundancy

A solver-redundant constraint is one which is redundant with respect to the current state of the constraint store

The point here is that the constraint system may reach a point where the current constraints are unsatisfiable, but the solver is unable to detect this

For example, the CLPR system is unable to detect the unsatisfiability of nonlinear constraint

If we run the program

fac(0,1).

fac(N, N*F) :-

N>=1,

fac(N-1, F).

with the goal fac(N,7), this will run forever because the system does not have the information that factorials are positive

We might think to add an answer-redundant literal as with sum to get

fac(0,1).

fac(N, N*F) :-

N>=1,

F>=1,

fac(N-1, F).

But the program still loops; after four loops, the system can infer the constraint

F''''>=1, N>=4, N*(N-1*(N-2*(N-3)))*F''''=7

which is unsatisfiable, but because it is nonlinear, CLPR cannot detect the unsatisfiability

Further transforming the program to

fac(0,1).

fac(N, FN) :-

FN=F*N,

N>=1,

F>=1,

N<=FN,

fac(N-1, F).

we get constraints including N<=FN, FN=7, and (after some steps) N>=8, permitting CLPR to fail the goal