Proof and Computation

In my last post, I mentioned that I wanted to write about automated theorem proving. As it turns out, finding quality blog posts about this subject is very difficult due to it being a relatively new and confusing field. I did however come across a community blog of scientists, Less Wrong, that gave me a good starting place for thinking about the problem. There is a general method using first order logic that would allow us to construct a proof for any provable proposition and the author Jonah Sinick mentions this, but there are also downsides noted.

“It’s easy to give an algorithm that generates a proof of a mathematical theorem that’s provable: choose a formal language with definitions and axioms, and for successive values of n, enumerate all sequences of mathematical deductions of length n, halting if the final line of a sequence is the statement of the desired theorem. But the running time of this algorithm is exponential in the length of the proof, and the algorithm is infeasible to implement except for theorems with very short proofs.”

It’s certainly interesting that a general algorithm exists for provable propositions but the algorithm is awful. All we can guarantee in this algorithm is that a deduction will be returned in some finite amount of time (the universe may die before it can be returned). I suppose a kind of ultimate question is if there is some algorithm that necessarily runs in less time than the enumeration technique and produces the same proof? I think the fact that the author raises the obvious connection between theorem proving and general artificial intelligence is interesting, since it seems likely that new advances in “deep-learning” techniques will create better algorithms in the future. Furthermore, there is the question of what currently is provable in a general sense. The blog post mentions that only trivial theorems seem provable. Sinick writes, “I don’t know of any computer programs that have been able to prove theorems outside of the class ‘very routine and not requiring any ideas,’ without human assistance (and without being heavily specialized to an individual theorem).”

Sinick then mentions an example of theorems that if provable would hugely increase his confidence in the prospect of a true general AI. To him this could mean deriving the Sylow subgroup theorems. Since the Sylow subgroup theorems are constructed from the group axioms, we necessarily need to prove the Sylow subgroup theorems from the group axioms. I’ve had experience constructing these proofs myself, and the bizarre thing is that often some idea not in the axioms or derived from the axioms must be adapted to prove the theorems. These ideas themselves can be thought of as giving the justification for a deduction. Modus ponens or logical inference is an idea that sits behind some deduction in first order logic for instance or even the law of contradiction.

An efficient method for finding a correct deduction must then be able to illuminate the ideas that sit behind deductions from axioms. One would naturally think that the time to find a deduction would be based, at least partially, on the “complexity” of the set of ideas, which would be difficult to define concretely since two sets of separate ideas may be able to prove the same theorem. Perhaps something like Kolmogorov complexity (something that measures the complexity of a given string – for instance the string 0000 is less complex than 1214) would be a good starting place to look at complexity as a legitimate mathematical construct.

Ultimately, I think this problem has been very interesting to think about. The notion of a trivial set of theorems is interesting as well as the complexity of a set of ideas to required guarantee a deduction. Perhaps next time we can look at ways of defining the complexity of a set of axioms and an associated theorem T.

Edit: It seems that Chaitin’s incompleteness theorem will guarantee that some string is unprovably complex past some level of complexity. Likely, this will be a huge problem.

Provability and Truth

I find myself very disappointed with my initial formulation of mathematical intelligence in my first blog post, or rather I should say I am unsatisfied with it. Essentially, the premise that there should be some set of ideas that can be used to construct a desired proof seems obvious in all but the most exceptional cases – that is the existence of some fuzzy set like L[T,A] is to be taken granted for all provably true propositions and doesn’t require any unnecessary mathematical formulation. If we wanted to add a little bit of rigor we can at least say that L[T,A] can be populated with elements of a deduction chain based on modus ponens which ends with the theorem T. This idea of a deduction chain can be formalized within first-order logic, but it is beyond the scope of this blog to discuss it in detail. However, the concept of a deduction can lead to a more well-defined L[T,A]. Define L[T,A] as follows – a set of propositions B is included in L[T,A] if and only if B is a deduction from A to T. An actual example would be useful. Suppose we wanted to prove the proposition that Bob is an animal, and suppose we had axioms that Bob is a cat and that a cat is an animal. Quite clearly there is a deduction – bob is a cat and all cats are animals, therefore bob is a cat, so we include all of those propositions in L[T,A].

Now, this seems useless unless there is something to be gained from analyzing L[T,A]. I should instead focus on the properties of L[T,A]. Take the following question. Suppose B and C are valid deductions from A to T and with B not equal to C. Is there necessarily some deduction D such that D is not equal to B or C? Put another way, if we have two different deductions, can we find another not equal to the first two? Thought it doesn’t matter for our purposes, the answer returns in the affirmative if there is some common deductive element of B and C different from the axioms themselves and the desired theorem. Ultimately, the fact that L[T,A] has something like definite structural properties is interesting. A more interesting question is as follows: when does L[T,A] contain nothing? Well clearly, if T doesn’t follow from A or if T is false, then L[T,A] is empty. However, what if T does follow from A, and L[T,A] is empty? That is, T is true but undeducible or unprovable.

One would wisely consider whether such a situation is possible, but as Kurt Godel proved in the first part of the twentieth century, it most certainly is. Godel’s brilliant insight was to realize and show mathematically that the notions of provability and truth are not the same by showing that there are true yet unprovable statements. Perhaps more radically, Godel showed that truth was not only “that which can be justified.”  The resulting theorems were called Godel’s incompleteness theorems, and the first of them can be stated as follows. “Any consistent formal system F within which a certain amount of elementary arithmetic can be carried out is incomplete such that there are statements of the language of F which can neither be proved nor disproved in F.”

Godel himself uses a language example that hits at the heart of the matter. Take the following proposition T, “T is not provable in F” and note that the statement T makes a claim about itself (that is, it is self-referential). Now if T is true, then by definition T is not provable in F. Under this assumption, T is not provable, but it is certainly true on the basis that we assumed T is true, and we deduce that T is true but unprovable. If T is not true, then it must be false and we have that T is not not provable in F, or T is provable in F. If T is provable in F, then T is true in F and hence we arrive at a contradiction. That is T cannot both be true and false. Hence, the proposition “T is not provable in F” is true. This discovery was monumental – not only did it have profound mathematical implications, it also had profound philosophical implications by challenging the type of rationalism that seeks explanations for all things.

As for the figurative holy grail, it would be most useful to find a method for illuminating deductions. In my next post, I will look at automated theorem proving and the methods currently being used to find deductions autonomously.

What Makes a Mathematician

I recently came across an article by the highly respected Cal Tech mathematician Terence Tao titled, “Does one have to be a genius to do maths?”

Professor Tao, a man oddly enough with a reputed I.Q. of 230, raises the question of whether or not mathematicians can make contributions to their field regardless of inherit intelligence (within reason of course), and though he does essentially make the case for more or less productive mathematicians, he answers his own question with “an emphatic no.” But, I must say that I feel slightly let down by his answer, although I do agree with him. Its obviously a good thing that one doesn’t have to be a genius in order to be a working mathematician, but surely there must be some qualities that would make someone intellectually well-suited for mathematical research in the first place. Tao lists a few at the beginning of his piece, but these are hardly unique to mathematics – the ability to work hard would be advantageous in almost any number of fields for instance. The interesting question to me then seems to be what qualities would uniquely make someone a good mathematician. Tao doesn’t seem to answer this in any meaningful way,  which is a shame since his perspective  would be incredibly useful. It’s certainly always possible that Tao doesn’t understand why he himself is a good mathematician, but I’m sure he could at least offer a few interesting insights – if not for the budding mathematician then perhaps for the artificial intelligence and cognitive scientists.

The question isn’t merely philosophical which is a plus, since one could use the information to devise more intelligent machines or better teaching methods. I suppose I could take on the task of setting the limits on what constitutes mathematical ability, but such a task would quickly become too painful to carry on. Setting the sights a little lower and keeping in mind Paul Erdos’s comment that “a mathematician is a machine for turning coffee into theorems,” we could try to understand what makes someone good at deducing theorems. Now, the general procedure in deductive mathematics is to start with some rules (axioms) and then to derive necessary truths (theorems) on the basis of those axioms. However, the axioms themselves are not locked in a vacuum, and plenty of other rules are used in helping someone deduce theorems.  Take for instance the axioms defining the real numbers. If one wishes to prove that the square root of 2 is irrational, one must use logical rules that are not contained in the axioms in order to devise a proof. The ultimate question concerning deductive mathematics can then be phrased in the following informal way (perhaps thinking about the problem in this way will yield further insight). Given a set of theorems T and a set of axioms A, how does the set of necessary logical propositions (an admittedly informal notion) L[T:A] look? Well for a starter, it probably contains more theorems than are just contained in T. If T is just a single theorem, then the set L[T:A] is just all logic necessary to prove T from A. So where does this leave us? Well, it leaves us with a good starting point for defining general deductive intelligence and more loosely mathematical intelligence. That is, a general deductive intelligence is able to construct L[T:A]. Someone who is good at mathematics likewise is able to understand at least part of the structure of L[T:A]. Furthermore If one can figure out an efficient method for performing searches that “find” elements of L[T:A], then automatic theorem proving would be trivial because all of the justification for the proof is contained in L[T:A].

In the end, the search for the defining parameters of mathematical intelligence is hard to undertake for a reason. Quite clearly, a mathematician is required to have to certain baseline abilities that make his job possible. However, the current understanding of these abilities is limited due to difficult nature of the problem. One can imagine that there is some structure associated to a given set of axioms and a set of theorems which will represent the logical justification for those theorems. The best mathematicians are then those who best justify.