C as nat(X) => nat(s(X))
D as nat(X) => nat(s(s(X)))
C implies D, but C does not theta-subsume D
There is an interesting relationship between C and D: D is just C resolved with itself twice
Muggleton (1992) showed that this is a general situation: if C implies D, and if D is not a tautology, then there is a clause E, which can be got by resolving C with itself zero or more times (ie it might be C itself), and E theta-subsumes D.
A number of attempts have been made to use this for an algorithm for generalisation, by searching for an appropriate C
However there are too many sources of non-determinism for it to be a practical approach
This clause has the property that any clause C which implies D also theta-subsumes D'
Again, there is a practicality problem: D' is often large, and for some clauses may be infinite
(B & H) => E
or alternatively
(B & ~E) => ~H
Progol attempts to find , the "most specific clause" corresponding to B and E
It may be defined semi-constructively:
~ is the conjunction of all clauses that are true in all models of (B & ~E)
It has the important property that all candidates for H theta-subsume some sub-saturant of
Unfortunately, is often infinite
Progol instead attempts to construct i, where i specifies the maximum "depth" (in Golem's terms) of variables in the clauses
Progol limits the search by only looking at theta-subsumptions of i itself, rather than of sub-saturants
Progol also imposes mode declarations similar to those of Golem
However progol's mode declarations further limit the range of possible values for variables by :
* requiring a type discipline: the type of each variable is declared
* permitting the user to specify the number of alternatives which may be tried in attempting to instantiate the particular atom
(B & H) => E
in general, there may be more than one such solution, so progol seeks the H with the greatest Occam compression
However, like Golem it uses the reduced version of the clause, not the original clause itself
Progol uses an A*-like search algorithm; in particular, it is an admissible algorithm (ie if the best H is reachable, progol will find it)
Unfortunately, the refinement operator used is not complete: there are cases in which H exists, but is not reachable (but this is not surprising: it has recently been shown that any refinement operator satisfying reasonable requirements must be incomplete)
Many of the systems described so far descend from a tradition of learning logic programs
Thus to reduce the search space, they focus on heuristics that lead to program efficiency, on the assumption that learning would not be attempted if it were not assumed that an efficient program was available.
FOIL, on the other hand, is far more influenced by the traditions of attribute based learning, and emphasises the finding of true descriptions of the World, of logic programs
FOIL differs from many of the other systems in being essentially non-interactive (it does not assume an oracle able to answer queries)
It normally requires negative as well as positive examples
(FOIL will optionally use the Closed World Assumption, ie if a possible example is not specified as a positive example of the concept, it is assumed to be a negative example)
FOIL also differs from most of the previously considered systems in using a strongly typed language to reduce search space
* Find a rule that covers some of the data in the dataset
* Store the rule and remove the covered data from the dataset
* Re-run the algorithm on the reduced dataset
FOIL's inner loop, which generates individual rules, is a general-to-specific algorithm
In attempting to learn a rule of the form Hyp => Class, FOIL begins with Hyp empty (ie with the rule that says everything is in Class), and gradually specialises Hyp
It does so by conjoining literals one by one to Hyp, using a greedy (ie non-backtracking hillclimbing) search algorithm.
The next literal conjoined to the right of the hypothesis will be that which produces the highest information gain relative to the current training set
The current training set is that subset of the data which satisfies the current hypothesis
However, FOIL fudges this metric slightly, using two heuristics which can help it to learn in situations where information gains are low
FOIL encourages this by giving a small positive bonus for literals which introduce new variables
Later versions of FOIL incorporate a bias toward determinate literals, by allowing the user to specify an information gain threshold
If at any stage in its search, FOIL is unable to find a literal that produces information gain greater than the threshold, all determinate literals are added to the clause
To ensure that FOIL does not enter an infinite loop of adding new determinate literals at each step, it incorporates Golem's idea of a maximum allowable determinacy depth (recall the i of ij-determinacy), which FOIL sets at 5 (by default)
Note that this pruning justifies the over-optimistic use of determinacy in the learning phase (where all determinate literals are added) since any irrelevant determinate literals will be removed in the pruning phase
Mobal is described as 'an environment for incremental modelling'
Perhaps it is easiest to think of it as an environment for acquiring knowledge for expert systems
As such, Mobal provides many integrated tools for acquiring and regularising knowledge
RDT is the tool provided for acquiring relational knowledge from data
Thus RDT has a more pragmatic approach than previously discussed systems, even FOIL
To ensure this can be done, background knowledge is constrained to consist of generative rules, as with Golem
Rule models are restricted second order formulae: they consist of rules in which the predicate symbols are permitted to be variables
The first order language searched is the set of subclauses of substition instances of the rule models
Hypotheses that have already been either accepted as valid or pruned as too special are remembered
New candidate hypotheses are first checked to see if they are specialisations of remembered hypotheses (in which case there is no point in testing further)
The user may specify simple arithmetic functions of such variables as the number of instances correctly predicted by the hypothesis, the number incorrectly predicted by the hypothesis to be members of the class, the number incorrectly predicted not to be members of the class, etc
The formula so specified will be used as the heuristic to guide search
For example, if we apply a substition that takes Y to X, to a formula p(X,Y), we get p(X,X), which is more specific than the original
On the other hand, if we apply a substitution that takes P2 to P1 to the formula P1(X,Y) & P2(X,Y), we get the formula P1(X,Y), which is more general than the original
With a model such as
P(Y) & R(X,Y) => c(X)
It makes little sense to instantiate P first, since the variable Y in the hypothesis is totally unrelated to the variable X in the conclusion, so the rule will have no predictivity
A literal is directly connected to another literal if they share a variable
Predicate variables are substituted in the order determined by these connection chains:
A predicate variable may only be substituted if it is directly connected to a literal which already has an instantiated predicate symbol
Grendel is built on top of FOIL, and is principally concerned with investigating the declarative specification of inductive bias, both language and search bias
Instead of a rule model, Grendel provides an Antecedent Description Grammar (ADG)
An ADG is essentially a context free grammar in which the symbols are logical literals (ADGs are closely related to prolog Definite Clause Grammars, and indeed may be expressed in higher-order logic programming languages as higher-order DCGs)
The basic idea is that the only hypotheses considered are those which may be generated by the ADG
In this form, Grendel provides only language, not search, bias - but Cohen was able to demonstrate that virtually any language bias found in the literature could be expressed in this form
body(illegal(A,B,C,D,E,F)) -> rels(A,B,C,D,E,F).
rels(A,B,C,D,E,F) -> [].
rels(A,B,C,D,E,F) -> rel(A,B,C,D,E,F), rels(A,B,C,D,E,F).
rel(A,B,C,D,E,F) -> pred(X,Y)
where member(X,[A,B,C,D,E,F]), member(Y,[A,B,C,D,E,F]).
pred(X,Y) -> [X = Y].
pred(X,Y) -> [not X = Y].
pred(X,Y) -> [adj(X,Y)].
pred(X,Y) -> [not adj(X,Y)].
pred(X,Y) -> [less_than(X,Y)].
pred(X,Y) -> [not less_than(X,Y)].
A grammar rule is deferred if it is prefixed with an exclamation mark
For example, Grendel might be informed to prefer positive literals by replacing the productions containing negative literals in our previous example with:
goal_formula(illegal(A,B,C,D,E,F)).
body(illegal(A,B,C,D,E,F)) -> rels(A,B,C,D,E,F).
rels(A,B,C,D,E,F) -> [].
rels(A,B,C,D,E,F) -> rel(A,B,C,D,E,F), rels(A,B,C,D,E,F).
rel(A,B,C,D,E,F) -> pred(X,Y)
where member(X,[A,B,C,D,E,F]), member(Y,[A,B,C,D,E,F]).
pred(X,Y) -> [X = Y].
!pred(X,Y) -> [not X = Y].
pred(X,Y) -> [adj(X,Y)].
!pred(X,Y) -> [not adj(X,Y)].
pred(X,Y) -> [less_than(X,Y)].
!pred(X,Y) -> [not less_than(X,Y)].
Only when Grendel fails to find a literal with positive information gain is the deferred portion of the grammar used
Once a literal with positive information gain has been found, the algorithm reverts to using only the non-deferred portion of the grammar
This mechanism is extremely simple, and more complex ones are readily devised
Nevertheless, Cohen has demonstrated that because the interaction between ADGs and deferral provides a high level of complexity, virtually any previously used search bias can be expressed in this form
Relational learning tools are now practical and useful for many real-World problems
However their computational cost is very high, particularly if the dataset is large
Only FOIL can really claim good handling of noisy data
Use an attribute-based learner if the problem can be transformed into an attribute problem