-
Notifications
You must be signed in to change notification settings - Fork 6
/
alphatap.tex
1042 lines (897 loc) · 39.1 KB
/
alphatap.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{Applications II: \alphatap}\label{alphatapchapter}
In this chapter we examine a second application of nominal logic
programming, a declarative theorem prover for first-order classical
logic. We call this prover \alphatap, since it is based on the
\leantapsp\cite{beckert95leantap} prover and written in
\alphakanren. Our prover is a relation, without mode restrictions;
given a logic variable as the theorem to be proved, \alphatapsp
\textit{generates} valid theorems.
\leantapsp is a lean tableau-based theorem prover for first-order
logic due to \citet{beckert95leantap}. Written in
Prolog, it is extremely concise and is capable of a high rate of
inference. \leantapsp uses Prolog's cut (\texttt{!}) in three of its
five clauses in order to avoid nondeterminism, and uses
\mbox{\texttt{copy\_term/2}} to make copies of universally quantified
formulas. Although Beckert and Posegga take advantage of Prolog's
unification and backtracking features, their use of the impure cut and
\mbox{\texttt{copy\_term/2}} makes \leantapsp non-declarative.
% : reordering goals within the prover may cause divergence.
%% new definition of nondeclarative?
In this chapter we translate \leantapsp from Prolog to impure
miniKanren, using \scheme|match-a| to mimic Prolog's cut, and
\scheme|copy-termo| to mimic \mbox{\texttt{copy\_term/2}}. We then show how
to eliminate these impure operators from our translation. To eliminate the
use of \scheme|match-a|, we introduce a tagging scheme that makes our
formulas unambiguous. To eliminate the use of \scheme|copy-termo|, we
use substitution instead of copying terms. Universally quantified
formulas are used as templates, rather than instantiated directly;
instead of representing universally quantified variables with logic
variables, we use the noms of nominal logic. We then use nominal
unification to write a substitution relation that replaces quantified
variables with logic variables, leaving the original template
untouched.
The resulting declarative theorem prover is interesting for two
reasons. First, because of the technique used to arrive at its
definition: we use declarative substitution rather than
\scheme|copy-termo|. To our knowledge, there is no method for
copying arbitrary terms declaratively. Our solution is not completely
general but is useful when a term is used as a template for copying,
as in the case of \leantap. Second, because of the flexibility of the
prover itself: \alphatapsp is capable of instantiating non-ground
theorems during the proof process, and accepts non-ground
\textit{proofs}, as well. Whereas \leantapsp is fully automated and
either succeeds or fails to prove a given theorem, \alphatapsp can
accept guidance from the user in the form of a partially-instantiated
proof, regardless of whether the theorem is ground.
We present an implementation of \alphatapsp in
section~\ref{implementation} , demonstrating our technique for
eliminating cut and \mbox{\texttt{copy\_term/2}} from \leantap. Our
implementation demonstrates our contributions: first, it illustrates a
method for eliminating common impure operators, and demonstrates the
use of nominal logic for representing formulas in first-order logic;
second, it shows that the tableau process can be represented as a
relation between formulas and their tableaux; and third, it
demonstrates the flexibility of relational provers to mimic the full
spectrum of theorem provers, from fully automated to fully dependent
on the user.
This chapter is organized as follows. In section~\ref{tableau} we
describe the concept of tableau theorem proving. In
section~\ref{alphatap} we motivate our declarative prover by examining
its declarative properties and the proofs it returns. In
section~\ref{implementation} we present the implementation of
\alphatap, and in section~\ref{performance} we briefly examine
\alphatap's performance. Familiarity with tableau theorem proving
would be helpful; for more on this topic, see the references given in
section~\ref{tableau}. In addition, a reading knowledge of Prolog
would be useful, but is not necessary; for readers unfamiliar with
Prolog, carefully following the miniKanren and \alphakanrensp code
should be sufficient for understanding all the ideas in this chapter.
\section{Tableau Theorem Proving}\label{tableau}
We begin with an introduction to tableau theorem proving and its
implementation in \leantap.
Tableau is a method of proving first-order theorems that works by
refuting the theorem's negation. In our description we assume basic
knowledge of first-order logic; for coverage of this subject and a
more complete description of tableau proving, see
\citet{fitting1996fol}. For simplicity, we consider only
formulas in Skolemized \textit{negation normal form} (NNF).
Converting a formula to this form requires removing existential
quantifiers through Skolemization, reducing logical connectives so
that only $\wedge$, $\vee$, and $\neg$ remain, and pushing negations
inward until they are applied only to literals---see section~3 of
\citet{beckert95leantap} for details.
To form a tableau, a compound formula is expanded into branches
recursively until no compound formulas remain. The leaves of this
tree structure are referred to as \textit{literals}. \leantapsp forms
and expands the tableau according to the following rules. When the
prover encounters a conjunction $x \wedge y$, it expands both $x$ and
$y$ on the same branch. When the prover encounters a disjunction $x
\vee y$, it splits the tableau and expands $x$ and $y$ on separate
branches. Once a formula has been fully expanded into a tableau, it
can be proved unsatisfiable if on each branch of the tableau there
exist two complementary literals $a$ and $\neg a$ (each branch is
\textit{closed}). In the case of propositional logic, syntactic
comparison is sufficient to find complementary literals; in
first-order logic, sound unification must be used. A closed tableau
represents a proof that the original formula is unsatisfiable.
The addition of universal quantifiers makes the expansion process more
complicated. To prove a universally quantified formula \mbox{$\forall x. M$},
\leantapsp generates a logic variable $v$ and expands $M$,
replacing all occurrences of $x$ with $v$ (i.e., it expands $M^{\prime}$ where
$M^{\prime} = M[v/x]$). If \leantapsp is unable to close the current branch
after this expansion, it has the option of generating another logic
variable and expanding the original formula again. When the prover
expands the universally quantified formula \mbox{$\forall x. F(x) \wedge ( \neg F({\sf a})
\vee \neg F({\sf b}) )$}, for example, \mbox{$\forall x. F(x)$}
must be expanded twice, since $x$ cannot be instantiated to both
\textsf{a} and \textsf{b}.
\section{Introducing \alphatap}\label{alphatap}
We begin by presenting some examples of \alphatap's abilities, both in
proving ground theorems and in generating theorems. We also explore
the proofs generated by \alphatap, and show how passing
partially-instantiated proofs to the prover can greatly improve its
performance.
\subsection{Running Forwards}\label{forwards}
Both \leantapsp and \alphatapsp can prove ground theorems; in
addition, \alphatap\ produces a proof. This proof is a list
representing the steps taken to build a closed tableau for the
theorem; \citet{paulson99generic} has shown that translation to
a more standard format is possible. Since a closed tableau represents
an unsatisfiable formula, such a list of steps proves that the
negation of the formula is valid. If the list of steps is ground, the
proof search becomes deterministic, and \alphatapsp acts as a proof
checker.
\leantapsp encodes first-order formulas using Prolog terms. For
example, the term \mbox{\texttt{(p(b),all(X,(-p(X);p(s(X)))))}}
represents \mbox{$p($\textsf{b}$) \wedge \forall x . \neg p(x) \vee
p(s(x))$}. In our prover, we represent formulas using Scheme lists
with extra tags:
%, and in our final version we adopt a more extensive tagging
%scheme. The \schemeresult|forall| binder is represented by
%\alphakanren's \scheme|tie|, and variables are represented by noms.
%Our example formula is represented by the ground list:
\schemedisplayspace
\begin{schemeresponse}
(and-tag (pos (app p (app b))) (forall (tie anom (or-tag (neg (app p (var-tag anom)))
(pos (app p (app s (var-tag anom))))))))
\end{schemeresponse}
% The Prolog query \mbox{\texttt{prove(Fml,[],[],[],VarLim)}} succeeds
% if the formula \texttt{Fml} is unsatisfiable. Similarly, the
% \alphakanrensp goal \mbox{\scheme|(proveo fml '() '() '() proof)|}
% succeeds if \scheme|fml| can be shown to be unsatisfiable via the
% proof \scheme|proof|.
Consider Pelletier Problem 18~\cite{pelletier1986sfp}: \mbox{$\exists
y. \forall x. F(y) \Rightarrow F(x)$}. To prove this theorem in
\alphatap, we transform it into the following \textit{negation} of the
NNF:
\schemedisplayspace
\begin{schemeresponse}
(forall (tie anom (and-tag (pos (app f (var-tag anom))) (neg (app f (app g1 (var-tag anom)))))))
\end{schemeresponse}
\noindent where \schemeresult|`(app ,g1 (var-tag anom))| represents the
application of a Skolem function to the universally quantified
variable $a$. Passing this formula to the prover, we obtain the proof
\schemeresult|`(univ conj savefml savefml univ conj close)|. This proof
lists the steps the prover (presented in section~\ref{matcha}) follows to close
the tableau. Because both conjuncts of the formula contain the nom
$a$, we must expand the universally quantified formula more than once.
Partially instantiating the proof helps \alphatapsp prove theorems
with similar subparts. We can create a non-ground proof that describes
in general how to prove the subparts and have \alphatapsp fill in the
trivial differences. This can speed up the search for a proof
considerably. By inspecting the negated NNF of Pelletier Problem~21,
for example, we can see that there are at least two portions of the
theorem that will have the same proof. By specifying the structure of
the first part of the proof and constraining the identical portions by
using the same logic variable to represent both, we can give the
prover some guidance without specifying the whole proof. We pass the
following non-ground proof to \alphatap:
\schemedisplayspace
\vspace{-2pt}
\begin{centering}
\begin{schemeresponse}
(conj univ split (conj savefml savefml conj split Xvar Xvar)
(conj savefml savefml conj split (close) (savefml split Yvar Yvar)))
\end{schemeresponse}
\end{centering}
\vspace{-2pt}
\noindent On our test machine, our prover solves the original problem
with no help in 68 milliseconds (ms); given the knowledge that the
later parts of the proof will be duplicated, the prover takes only 27
ms. This technique also yields improvement when applied to Pelletier
Problem 43: inspecting the negated NNF of the formula, we see two
parts that look nearly identical. The first part of the negated
NNF---the part representing the theorem itself---has the following
form:
\schemedisplayspace
\vspace{-2pt}
\begin{centering}
\begin{schemeresponse}
(and-tag (or-tag (and-tag (neg (app Q (app g4) (app g3)))
(pos (app Q (app g3) (app g4))))
(and-tag (pos (app Q (app g4) (app g3)))
(neg (app Q (app g3) (app g4))))) ...)
\end{schemeresponse}
\end{centering}
\vspace{-2pt}
\noindent Since we suspect that the same proof might suffice for both
branches of the theorem, we give the prover the partially-instantiated
proof \mbox{\schemeresult|`(conj split Xvar Xvar)|}. Given just this
small amount of help, \alphatapsp proves the theorem in 720 ms,
compared to 1.5 seconds when the prover has no help at all. While
situations in which large parts of a proof are identical are rare,
this technique also allows us to handle situations in which different
parts of a proof are merely similar by instantiating as much or as
little of the proof as necessary.
\subsection{Running Backwards}\label{backwards}
% \begin{figure}[H]
% \begin{centering}
% \begin{tabular}{| r | c | c | c | c |}
% \hline
% Problem & \thinspace \leantap \thinspace\footnotemark[4] \thinspace &
% Translation\footnotemark[3] & \thinspace \alphatap\footnotemark[3]
% \thinspace & \thinspace \alphatap$\!_G$\footnotemark[4]$^,$\footnotemark[6] \\
% \hline
% 1 & ? &
% \hline
% \end{tabular}
% \caption{\alphatap's Performance on Pelletier's Problems\protect\footnotemark[2]
% \label{fig:performance}}
% \end{centering}
% \end{figure}
%\vspace{-6pt}
% Testing our prover on
% several of Pelletier's 75 problems~\cite{pelletier1986sfp} shows that
% \alphatapsp is about three to five times slower than our translation
% of \leantap. The translation solves problem 32, for example, in about
% one second, while \alphatapsp takes about three
% seconds; problem 26 takes our translation of \leantapsp about 13
% seconds, while \alphatapsp needs 36 seconds.
Unlike \leantap, \alphatapsp can generate valid theorems. Some
interpretation of the results is required since the theorems generated
are negated formulas in NNF.\footnote{The full implementation of
\alphatapsp includes a simple declarative translator from negated
NNF to a positive form.} In the example
\smallskip
\scheme|(run1 (q) (exist (x) (proveo q '() '() '() x)))|
\hspace{0.1cm}$\Rightarrow$
\schemeresult|`((and-tag (pos (app _.0)) (neg (app _.0))))|
\smallskip
\noindent
the reified logic variable \schemeresult|_.0| represents any
first-order formula $p$, and the entire answer represents the formula
$p \wedge \neg p$. Negating this formula yields the original theorem:
$\neg p \vee p$, or the law of excluded middle. We can also generate
more complicated theorems; here we use the ``generate and test'' idiom
to find the first theorem matching the negated NNF of the inference
rule {\it modus ponens}:
\schemedisplayspace
\begin{schemedisplay}
(run1 (q)
(exist (x)
(proveo x '() '() '() q)
(== `(and-tag (and-tag (or-tag (neg (app a)) (pos (app b))) (pos (app a))) (neg (app b)))
x)))
\end{schemedisplay}
\vspace{-.1cm}
\noindent $\Rightarrow$ \schemeresult|`((conj conj split (savefml close) (savefml savefml close)))|
\smallskip
\noindent This process takes about 5.1 seconds; {\it modus ponens} is the
173rd theorem to be generated, and the prover also generates a proof
of its validity. When this proof is given to \alphatap, {\it modus ponens}
is the sixth theorem generated, and the process takes only 20 ms.
Thus the declarative nature of \alphatapsp is useful both for
generating theorems and for producing proofs. Due to this flexibility,
\alphatapsp could become the core of a larger proof system. Automated
theorem provers like \leantapsp are limited in the complexity of the
problems they can solve, but given the ability to accept assistance
from the user, more problems become tractable.
%can solve more difficult problems.
%\footnotetext[7]{\alphatap$\!_G$ uses the unique name and preprocessor
% approach described in section 4.2.}
As an example, consider Pelletier Problem 47: Schubert's Steamroller.
This problem is difficult for tableau-based provers like \leantapsp
and \alphatap, and neither can solve it
automatically~\cite{beckert95leantap}. Given some help, however,
\alphatapsp can prove the Steamroller. Our approach is to prove a
series of smaller lemmas that act as stepping stones toward the final
theorem; as each lemma is proved, it is added as an assumption in
proving the remaining ones. The proof process is automated---the user
need only specify which lemmas to prove and in what order. Using this
strategy, \alphatapsp proves the Steamroller in about five seconds;
the proof requires twenty lemmas.
\alphatapsp thus offers an interesting compromise between large proof
assistants and smaller automated provers. It achieves some of the
capabilities of a larger system while maintaining the lean deduction
philosophy introduced by \leantap. Like an automated prover, it is
capable of proving simple theorems without user guidance. Confronted
with a more complex theorem, however, the user can provide a
partially-instantiated proof; \alphatapsp can then check the proof and
fill in the trivial parts the user has left out. Because \alphatapsp
is declarative, the user may even leave required axioms out of the
theorem to be proved and have the system derive them. This flexibility
comes at no extra cost to the user---the prover remains both concise
and reasonably efficient.
%% New
The flexibility of \alphatapsp means that it could be made interactive
through the addition of a read-eval-print loop and a simple proof
translator between \alphatap's proofs and a more human-readable
format. Since the proof given to \alphatapsp may be partially
instantiated, such an interface would allow the user to conveniently
guide \alphatapsp in proving complex problems. With the addition of
equality and the ability to perform single beta steps, this
flexibility would become more interesting---in addition to reasoning
about programs and proving properties about them, \alphatapsp would
instantiate non-ground programs during the proof process.
\section{Implementation}\label{implementation}
We now present the implementation of \alphatap. We begin with a
translation of \leantapsp from Prolog into \alphakanren. We then show
how to eliminate the translation's impure features through a
combination of substitution and tagging.
\leantapsp implements both expansion and closing of the tableau. When
the prover encounters a conjunction, it uses its argument
\texttt{UnExp} as a stack (Figure~\ref{fig:translation}): \leantapsp
expands the first conjunct, pushing the second onto the stack for
later expansion. If the first conjunct cannot be refuted, the second
is popped off the stack and expansion begins again. When a
disjunction is encountered, the split in the tableau is reflected by
two recursive calls. When a universal quantifier is encountered, the
quantified variable is replaced by a new logic variable, and the
formula is expanded. The \texttt{FreeV} argument is used to avoid
replacing the free variables of the formula. \leantapsp keeps a list
of the literals it has encountered on the current branch of the
tableau in the argument \texttt{Lits}. When a literal is encountered,
\leantapsp attempts to unify its negation with each literal in
\texttt{Lits}; if any unification succeeds, the branch is closed.
Otherwise, the current literal is added to \texttt{Lits} and expansion
continues with a formula from \texttt{UnExp}.
\subsection{Translation to \alphakanren}\label{translation}
While \alphakanrensp is similar to Prolog with the addition of nominal
unification, \alphakanrensp uses a variant of interleaving
depth-first search~\cite{backtracking}, so the order of
\scheme|conde| or \scheme|match-e| clauses in \alphakanrensp is irrelevant. Because of
Prolog's depth-first search, \leantapsp must use \texttt{VarLim} to
limit its search depth; in \alphakanren, \texttt{VarLim} is not
necessary, and thus we omit it.
In Figure~\ref{fig:translation} we present mK\leantap, our translation
of \leantapsp into \alphakanren; we label two clauses (\onet, \twot),
since we will modify these clauses later. To express Prolog's cuts,
our definition uses \scheme|match-a|. The final two clauses of
\leantapsp do not contain Prolog cuts; in mK\leantap, they are
combined into a single clause containing a \scheme|conde|. In place
of \leantap\thinspace's recursive call to \texttt{prove} to check the
membership of \texttt{Lit} in \texttt{Lits}, we call \scheme|membero|,
which performs a membership check using sound unification.\footnote{We define \scheme|membero| in Figure~\ref{fig:ending}; \scheme|membero| \emph{must} use sound unification, and cannot use \scheme|==-no-check|.} % Prolog's \texttt{copy\_term/2} is
% not built into \alphakanren; this addition is available as part of the
% mK\leantapsp source code.
%\begin{figure}[ht]
\begin{figure}[H]
%\vspace{-.3in}
\begin{tabular}{l l}
&
\begin{minipage}{2.3in}
\begin{schemedisplay}
(define proveo
(lambda (fml unexp lits freev)
(match-a fml
\end{schemedisplay}
\end{minipage} \\
\begin{minipage}{2.3in}
\begin{verbatim}
prove((E1,E2),UnExp,Lits,
FreeV,VarLim) :- !,
prove(E1,[E2|UnExp],Lits,
FreeV,VarLim).
\end{verbatim}
\end{minipage}
&
\begin{minipage}{2in}
\begin{schemedisplay}
(`(and-tag ,e1 ,e2)
(proveo e1 `(,e2 . ,unexp) lits freev))
\end{schemedisplay}
\end{minipage}
\\
\begin{minipage}{2in}
\begin{verbatim}
prove((E1;E2),UnExp,Lits,
FreeV,VarLim) :- !,
prove(E1,UnExp,Lits,FreeV,VarLim),
prove(E2,UnExp,Lits,FreeV,Varlim).
\end{verbatim}
\end{minipage}
&
\begin{minipage}{2in}
\vspace{1mm}
\begin{schemedisplay}
(`(or-tag ,e1 ,e2)
(proveo e1 unexp lits freev)
(proveo e2 unexp lits freev))
\end{schemedisplay}
\vspace{1mm}
\end{minipage}
\\
\begin{minipage}{2in}
\begin{verbatim}
prove(all(X,Fml),UnExp,Lits,
FreeV,VarLim) :- !,
\+ length(FreeV,VarLim),
copy_term((X,Fml,FreeV),
(X1,Fml1,FreeV)),
append(UnExp,[all(X,Fml)],UnExp1),
prove(Fml1,UnExp1,Lits,
[X1|FreeV],VarLim).
\end{verbatim}
\end{minipage}
&
\begin{minipage}{2in}
\begin{schemedisplay}
$\onet$(`(forall ,x ,body)
(exist (x1 body1 unexp1)
(copy-termo `(,x ,body ,freev)
`(,x1 ,body1 ,freev))
(appendo unexp `(,fml) unexp1)
(proveo body1 unexp1 lits
`(,x1 . ,freev))))
\end{schemedisplay}
\end{minipage}
\\
\begin{minipage}{2in}
\begin{verbatim}
prove(Lit,_,[L|Lits],_,_) :-
(Lit = -Neg; -Lit = Neg) ->
(unify(Neg,L);
prove(Lit,[],Lits,_,_)).
\end{verbatim}
\end{minipage}
&
\begin{minipage}{2in}
\begin{schemedisplay}
$\twot$(fml
(conde
((match-a `(,fml ,neg)
(`((not ,neg) ,neg))
(`(,fml (not ,fml))))
(membero neg lits))
\end{schemedisplay}
\end{minipage}
\\
\begin{minipage}{2in}
\begin{verbatim}
prove(Lit,[Next|UnExp],Lits,
FreeV,VarLim) :-
prove(Next,UnExp,[Lit|Lits],
FreeV,VarLim).
\end{verbatim}
\end{minipage}
&
\begin{minipage}{2in}
\begin{schemedisplay}
((exist (next unexp1)
(== `(,next . ,unexp1) unexp)
(proveo next unexp1 `(,fml . ,lits)
freev))))))))
\end{schemedisplay}
\end{minipage}
\\
\end{tabular}
\caption{\leantapsp and mK\leantap\thinspace: a translation from Prolog to \alphakanren
\label{fig:translation}}
%\vspace{-.3in}
\end{figure}
\subsection{Eliminating \copytermo}\label{copytermo}
\enlargethispage{1\baselineskip} %
Since \scheme|copy-termo| is an impure operator, its use makes
\scheme|proveo| non-declarative: reordering the goals in the prover
can result in different behavior. For example, moving the call to
\scheme|copy-termo| after the call to \scheme|proveo| causes the
prover to diverge when given any universally quantified formula. To
make our prover declarative, we must eliminate the use of
\scheme|copy-termo|.
Tagging the logic variables that represent universally quantified
variables allows the use of a declarative technique that creates two
pristine copies of the original term: one copy may be expanded and the
other saved for later copying. Unfortunately, this copying examines
the entire body of each quantified formula and instantiates the
original term to a potentially invalid formula.
Another approach is to represent quantified variables with symbols or
strings. When a new instantiation is needed, a new variable name can
be generated, and the new name can be substituted for the old without
affecting the original formula. This solution does not destroy the
prover's input, but it is difficult to ensure that the provided data
is in the correct form declaratively: if the formula to be proved is
non-ground, then the prover must generate unique names. If the
formula \textit{does} contain these names, however, the prover must
\textit{not} generate new ones. This problem can be solved with a
declarative preprocessor that expects a logical formula
\textit{without} names and puts them in place. If the preprocessor is
passed a non-ground formula, it instantiates the formula to the
correct form. %We have implemented this strategy in a Prolog prover we
%call \alphatap$\!_G$;
The requirement of a preprocessor, however,
means the prover itself is not declarative.
We use nominal logic to solve the \scheme|copy-termo| problem.
Nominal logic is a good fit for this problem, as it is designed to
handle the complexities of dealing with names and binders
declaratively.
%Using
%noms to represent universally quantified variables and the
%\scheme|tie| operator to represent the $\forall$ binder allows us to
%avoid the use of logic variables to represent quantified variables.
Since noms represent unique names, we achieve the benefits of the
symbol or string approach without the use of a preprocessor. We can
generate unique names each time we encounter a universally quantified
formula, and use nominal unification to perform the renaming of the
quantified variable. If the original formula is uninstantiated, our
newly-generated name is unique and is put in place correctly; we no
longer need a preprocessor to perform this function.
Using the tools of nominal logic, we can modify mK\leantapsp to
represent universally quantified variables using noms and to perform
substitution instead of copying. When the prover reaches a literal,
however, it must replace each nom with a logic variable, so that
unification may successfully compare literals. To accomplish this, we
associate a logic variable with each unique nom, and replace every nom
with its associated variable before comparing literals. These
variables are generated each time the prover expands a quantified
formula.
To implement this strategy, we change our representation of formulas
slightly. Instead of representing $\forall x. F(x)$ as
\mbox{\schemeresult|`(forall Xvar (f Xvar))|}, we use a nom wrapped in
a \scheme|var-tag| tag to represent a variable reference, and the
term constructor \scheme|tie| to represent the $\forall$ binder:
\mbox{\schemeresult|`(forall (tie anom (f (var-tag anom))))|}, where $a$ is
a nom. The \scheme|var-tag| tag allows us to distinguish noms
representing variables from other formulas. We now write a relation
\scheme|subst-lito| to perform substitution of logic variables for
tagged noms in a literal, and we modify the literal case of
\scheme|proveo| to use it. We also replace the clause handling
\schemeresult|forall| formulas and define \scheme|lookupo|. The two
clauses of \scheme|lookupo| overlap, but since each mapping in the
environment is from a unique nom to a logic variable, a particular nom
will never appear twice.
We present the changes needed to eliminate \scheme|copy-termo| from
mK\leantapsp in Figure~\ref{fig:changes}. Instead of copying the body
of each universally quantified formula, we generate a logic variable
\scheme|x| and add an association between the nom representing the
quantified variable and \scheme|x| to the current environment. When we
prepare to close a branch of the tableau, we call \scheme|subst-lito|,
replacing the noms in the current literal with their associated logic
variables.
\begin{figure}[H]
\noindent \begin{tabular}{l l}
\begin{minipage}{2.5in}
\small
\begin{schemedisplay}
$\onet$(`(forall (tie-tag ,@a ,body))
(exist (x unexp1)
(appendo unexp `(,fml) unexp1)
(proveo body unexp1 lits
`((,a . ,x) . ,env))))
$\twot$(fml
(exist (lit)
(subst-lito fml env lit)
(conde
((match-a `(,lit ,neg)
(`((not ,neg) ,neg))
(`(,lit (not ,lit))))
(membero neg lits))
((exist (next unexp1)
(== `(,next . ,unexp1) unexp)
(proveo next unexp1 `(,lit . ,lits)
env))))))
\end{schemedisplay}
\vspace{.1cm}
\end{minipage}
&
\begin{minipage}{1.2in}
\small
%\schemeinput{code/lookupo}
\begin{schemedisplay}
(define lookupo
(lambda (a env out)
(match-e env
(`((,a . ,out) . ,rest))
(`(,first . ,rest)
(lookupo a rest out)))))
\end{schemedisplay}
\begin{schemedisplay}
(define subst-lito
(lambda (fml env out)
(match-a `(,fml ,out)
(`((var-tag ,a) ,out)
(lookupo a env out))
(`((,e1 . ,e2) (,r1 . ,r2))
(subst-lito e1 env r1)
(subst-lito e2 env r2))
(`(,fml ,fml)))))
\end{schemedisplay}
\end{minipage}
\end{tabular}
\caption{Changes to mK\leantapsp to eliminate \protect\scheme|copy-termo|
\label{fig:changes}}
%\vspace{-.2in}
\end{figure}
The original \mbox{\texttt{copy\_term/2}} approach used by \leantapsp and
mK\leantapsp avoids replacing free variables by copying the list
\scheme|`(,x ,body ,freev)|. The copied version is unified with the list
\scheme|`(x1 body1 ,freev)|, so that \textit{only} the variable
\scheme|x| will be replaced by a new logic variable---the free
variables will be copied, but those copies will be unified with the
original variables afterwards. Since our substitution strategy does
not affect free variables, the \scheme|freev| argument is no longer
needed, and so we have eliminated it.
\subsection{Eliminating \matchasymbol}\label{matcha}
Both \scheme|proveo| and \scheme|subst-lito| use \scheme|match-a|
because the clauses that recognize literals overlap with the other
clauses. To solve this problem, we have designed a tagging scheme that
ensures that the clauses of our substitution and \scheme|proveo|
relations do not overlap. To this end, we tag both positive and
negative literals, applications, and variables. Constants are
represented by applications of zero arguments. Our prover thus accepts
formulas of the following form:
% \begin{center}
% \begin{tabular}{lcl}
% $<$Fml$>$ & $\rightarrow$ & $($\textsf{or} $<$Fml$>$ $<$Fml$>)$
% % \\ & $|$ &
% $|$ $($\textsf{and} $<$Fml$>$ $<$Fml$>)$
% \\ & $|$ &
% $($\textsf{forall} $<$nom$>$ $<$Fml$>)$
% % \\ & $|$ &
% $|$ $($\textsf{lit} $<$Lit$>)$
% \\
% $<$Lit$>$ & $\rightarrow$ & $($\textsf{pos} $<$Term$>)$
% % \\ & $|$ &
% $|$ $($\textsf{neg} $<$Term$>)$
% \\
% $<$Term$>$ & $\rightarrow$ & $($\textsf{sym} $<$symbol$>)$
% % \\ & $|$ &
% $|$ $($\textsf{var} $<$nom$>)$
% % \\ & $|$ &
% $|$ $($\textsf{app} $<$symbol$>$ $<$Term$>$*$)$ \\
% \end{tabular}
% \end{center}
% \begin{center}
% \begin{tabular}{lcl}
% Fml & $\rightarrow$ & $($\textsf{and} Fml Fml$)$
% % \\ & $|$ &
% $|$ $($\textsf{or} Fml Fml$)$
% % \\ & $|$ &
% $|$ $($\textsf{forall} $($\scheme|tie| nom Fml$))$
% % \\ & $|$ &
% $|$ Lit
% \\
% Lit & $\rightarrow$ & $($\textsf{pos} Term$)$
% % \\ & $|$ &
% $|$ $($\textsf{neg} Term$)$
% \\
% Term & $\rightarrow$ & %$($\textsf{sym} symbol$)$
% % \\ & $|$ &
% %$|$
% $($\textsf{var} nom$)$
% % \\ & $|$ &
% $|$ $($\textsf{app} symbol Term*$)$ \\
% \end{tabular}
% \end{center}
%\vspace{-.2cm}
\begin{center}
\begin{tabular}{lcl}
\textit{Fml} & $\rightarrow$ & $($\textsf{and} \textit{Fml} \textit{Fml}$)$
$|$ $($\textsf{or} \textit{Fml} \textit{Fml}$)$
$|$ $($\textsf{forall} $($\scheme|tie| \textit{Nom} \textit{Fml}$))$
$|$ \textit{Lit}
\\
\textit{Lit} & $\rightarrow$ & $($\textsf{pos} \textit{Term}$)$
$|$ $($\textsf{neg} \textit{Term}$)$
\\
\textit{Term} & $\rightarrow$ &
$($\textsf{var} \textit{Nom}$)$
$|$ $($\textsf{app} \textit{Symbol} \textit{Term}*$)$ \\
\end{tabular}
\end{center}
%\vspace{-.2cm}
This scheme has been chosen carefully to allow unification to compare
literals. In particular, the tags on variables \textit{must} be
discarded before literals are compared. Consider the two non-ground
literals \mbox{\schemeresult|`(not (f Xvar))|} and
\mbox{\schemeresult|`(f (p Yvar))|}. These literals are complementary:
the negation of one unifies with the other, associating $x$ with
\mbox{\schemeresult|`(p Yvar)|}. When we apply our tagging scheme,
however, these literals become \mbox{\schemeresult|`(neg (app f (var-tag Xvar)))|} and \mbox{\schemeresult|`(pos (app f (app p (var-tag Yvar))))|}, respectively, and are no longer complementary: their
subexpressions \mbox{\schemeresult|`(var-tag Xvar)|} and
\mbox{\schemeresult|`(app p (var-tag Yvar))|} do not unify. To avoid this
problem, our substitution relation discards the \textsf{var} tag when
it replaces noms with logic variables.
\begin{figure}[H]
%\vspace{-.2in}
\hspace{-.1in}
\begin{tabular}{l l}
\begin{minipage}{1.8in}
%\schemeinput{code/alphatapleft}
\begin{schemedisplay}
(define proveo
(lambda (fml unexp lits env proof)
(match-e `(,fml ,proof)
(`((and-tag ,e1 ,e2) (conj . ,prf))
(proveo e1 `(,e2 . ,unexp)
lits env prf))
(`((or-tag ,e1 ,e2) (split ,prf1 ,prf2))
(proveo e1 unexp lits env prf1)
(proveo e2 unexp lits env prf2))
(`((forall (tie-tag ,@a ,body)) (univ . ,prf))
(exist (x unexp1)
(appendo unexp `(,fml) unexp1)
(proveo body unexp1 lits
`((,a . ,x) . ,env) prf)))
(`(,fml ,proof)
(exist (lit)
(subst-lito fml env lit)
(conde
((== `(close) proof)
(match-e `(,lit ,neg)
(`((pos ,tm) (neg ,tm)))
(`((neg ,tm) (pos ,tm))))
(membero neg lits))
((exist (next unexp1 prf)
(== `(,next . ,unexp1) unexp)
(== `(savefml . ,prf) proof)
(proveo next unexp1 `(,lit . ,lits)
env prf)))))))))
\end{schemedisplay}
%\vspace{1.3cm}
\end{minipage}
&
\hspace{-0.3in}
\begin{minipage}{1.8in}
%\schemeinput{code/alphatapright}
\begin{schemedisplay}
(define appendo
(lambda-e (ls s out)
(`(() ,s ,s))
(`((,a . ,d) ,s (,a . ,r))
(appendo d s r))))
(define subst-lito
(lambda-e (fml env out)
(`((pos ,l) ,env (pos ,r))
(subst-termo l env r))
(`((neg ,l) ,env (neg ,r))
(subst-termo l env r))))
(define subst-termo
(lambda-e (fml env out)
(`((var-tag ,a) ,env ,out)
(lookupo a env out))
(`((app ,f . ,d) ,env (app ,f . ,r))
(subst-term* d env r))))
(define subst-term*
(lambda-e (tm* env out)
(`(() __ ()))
(`((,e1 . ,e2) ,env (,r1 . ,r2))
(subst-termo e1 env r1)
(subst-term* e2 env r2))))
(define membero
(lambda (x ls)
(exist (a d)
(== `(,a . ,d) ls)
(conde
((== a x))
((membero x d))))))
\end{schemedisplay}
%\vspace{1.0cm}
\end{minipage}
\end{tabular}
\caption{Final definition of \alphatap
\label{fig:ending}}
%\vspace{-.3in}
\end{figure}
Given our new tagging scheme, we can easily rewrite our substitution
relation without the use of \scheme|match-a|. We simply follow the
production rules of the grammar, defining a relation to recognize
each.
Finally, we modify \scheme|proveo| to take advantage of the same tags.
We also add a \scheme|proof| argument to \scheme|proveo|. We call
this version of the prover \alphatap, and present its definition in
Figure~\ref{fig:ending}. It is declarative, since we have eliminated
the use of \scheme|copy-termo| and every use of \scheme|match-a|. In
addition to being a sound and complete theorem prover for first-order
logic, \alphatapsp can now generate valid first-order theorems.
\section{Performance}\label{performance}
\enlargethispage{1\baselineskip} %
Like the original \leantap, \alphatapsp can prove many theorems in
first-order logic. Because it is declarative, \alphatapsp is generally
slower at proving ground theorems than mK\leantap, which is slower
than the original \leantap. Figure~\ref{fig:performance} presents a
summary of \alphatap's performance on the first 46 of Pelletier's 75
problems~\cite{pelletier1986sfp}, showing it to be roughly twice as
slow as mK\leantap.
These performance numbers suggest that while there is a penalty to be
paid for declarativeness, it is not so severe as to cripple the
prover. The advantage mK\leantapsp enjoys over the original \leantapsp
in Problem 34 is due to \alphakanren's interleaving search strategy;
as the result for mK\leantapsp shows, the original \leantapsp is faster
than \alphatapsp for any given search strategy.
Many automated provers now use the TPTP problem
library~\cite{stucliffe1994tpl} to assess performance. Even though it
is faster than \alphatap, \leantapsp solves few of the TPTP
problems. The Pelletier Problems, on the other hand, fall into the
class of theorems \leantapsp was designed to prove, and so we feel
they provide a better set of tests for the comparison between
\leantapsp and \alphatap.
\begin{figure}[h]
%\vspace{-.2in}
\begin{centering}
\begin{tabular}{l l}
\hspace{-.1in}
\begin{minipage}{2.7in}
\begin{tabular}{| r | c | c | c | } %c |
\hline
\thinspace \thinspace \# & \thinspace \leantap \thinspace &
mK\leantap \thinspace & \thinspace \alphatap
\thinspace %& \thinspace \alphatap$\!_G$\footnotemark[5]$^,$\footnotemark[7]
\\
\hline
1 & 0.1 & 0.7 & 2.0 \\
2 & 0.0 & 0.1 & 0.3 \\
3 & 0.0 & 0.2 & 0.5 \\
4 & 0.0 & 1.0 & 1.7 \\
5 & 0.1 & 1.2 & 2.5 \\
6 & 0.0 & 0.1 & 0.2 \\
7 & 0.0 & 0.1 & 0.2 \\
8 & 0.0 & 0.3 & 0.8 \\
9 & 0.1 & 4.3 & 9.7 \\
10 & 0.3 & 5.5 & 10.2 \\
11 & 0.0 & 0.3 & 0.6 \\
12 & 0.6 & 17.7 & 31.9 \\
13 & 0.1 & 3.7 & 8.2 \\
14 & 0.1 & 4.2 & 9.7 \\
15 & 0.0 & 0.8 & 1.9 \\
16 & 0.0 & 0.2 & 0.6 \\
17 & 1.1 & 9.2 & 18.1 \\
18 & 0.1 & 0.5 & 1.2 \\
19 & 0.3 & 15.1 & 33.5 \\
20 & 0.5 & 8.1 & 12.7 \\
21 & 0.4 & 22.1 & 38.7 \\
22 & 0.1 & 3.4 & 6.4 \\
23 & 0.1 & 2.5 & 5.4 \\
\hline
\end{tabular}
\end{minipage}
&
\begin{minipage}{2.5in}
\begin{tabular}{| r | c | c | c |} %c |
\hline
\# & \thinspace \leantap \thinspace &
mK\leantap \thinspace & \thinspace \alphatap
\thinspace %& \thinspace \alphatap$\!_G$\footnotemark[5]$^,$\footnotemark[7]
\\
\hline
24 & 1.7 & 31.9 & 60.3 \\
25 & 0.2 & 7.5 & 14.1 \\
26 & 0.8 & 130.9 & 187.5 \\
27 & 2.3 & 40.4 & 79.3 \\
28 & 0.3 & 19.1 & 29.6 \\
29 & 0.1 & 27.9 & 57.0 \\
30 & 0.1 & 4.2 & 9.6 \\
31 & 0.3 & 13.2 & 23.1 \\
32 & 0.2 & 23.9 & 42.4 \\
33 & 0.1 & 15.9 & 39.2 \\
34 & 199129.0 & 7272.9 & 8493.5 \\
35 & 0.1 & 0.5 & 1.1 \\
36 & 0.2 & 6.7 & 12.4 \\
37 & 0.8 & 123.3 & 169.2 \\
38 & 8.9 & 4228.8 & 8363.8 \\
39 & 0.0 & 1.1 & 2.8 \\
40 & 0.2 & 8.1 & 19.2 \\