Somehow, as always, nobody seems to ask the right question, namely: “What are we teaching our machines when they are ‘machine-learning’”? What type of behavior are we encouraging or discouraging? How does it allign with our overall goals and particular (and momentary) expectations? Let’s look into it in a little bit of a detail.

    Let’s watch the hands of the prestidigitator closely, maybe we will be able to notice where the rabbit comes from if we watch it in slow motion:

  • Step 1: We show an object to our ‘machine-pupil’ (the stupid britanism ‘pupil’ with it’s second purely physiological meaning fits beautifully here);
  • Step 2: We let the machine to make a ‘guess’ based on what it has ‘learned’ before;
  • Step 3: We show the so called ‘label’ to another side of the machine and let it ‘backpropagate the error’;
  • Step 3: We ‘reinforce’ (‘reward’ in some way) the closeness or the guess;
  • Step 4: The numerous and well paid personnel of the facility filled with expensive computer equipment sold to you by “the most innovative manufacturer in the World” as it calls itself, keeps repeating this procedure like a ‘well oiled machine’… in perpetuity.

    Remember how we were taught in school? Maybe this is the time to remember what happened when we were kids and take a look at what we are doing now to our machines? How did you learn? I remember myself learning, I remember what I was taught not to do and that’s exactly where we should start this excurtion into our past.
    The completely forbidden practice was to look up the answer to a problem that you were given before you at least had tried to solve it yourself by reasoning in a prescribed way about the substance of the problem. The prohibition went as far as removing the pages with answers to the problems from the books in the school library. The main task was to understand the problem and be able to explain it in your own words, show the causes at play in the context offered by the problem and develop a plan of action that would let you predict the effect(s) and in this way start solving the problem.

What do we force our machines to do? - Not only we do not let them to deliberate about the substance of the problem - we show them the answer (that very ‘label’ that we were just talking about) and force them to ingest it. And that is not all.

    Remember when your teacher was saying to you:”Sit down, you are just guessing (the right answer).” Did your teacher of mathematics or physics say that to you? Mine did! And I was really good at it too, by the way, somehow I could imagine the space-time picture of pretty complex phenomena in motion that I could (and still can) fast-forward or rewind and scale, rotate and shift and see from any angle. I’m not as good at it as Ramanujan or something, but I was pretty awesome compared to my classmates (and the teacher lady herself) then,- I can only imagine how annoying it was for her. She would never let me to express in words these pictures that I saw ‘in front of my inner eyes’ or draw them the way I wanted, because that wouldn’t be a drawing in the prescribed format of ‘isometric projection’.

What do we force our machines to do? - Not only we do not let them to deliberate about their ‘visions’ we are trying to minimize the circuit as much as possible. Why? Because we want an indefinite repetition of the process, of course!

    Now a little bit about plotting, graphs and all that. The gravest sin in my early adult life when I was working in the Physics Institute was to ‘fit the data to the curve’ or ‘fit the curve to the data’ without any explanation or even reasoning why this particular change in parameters or exclusion of points from the experimental data should be done. And there is a reason for that. Pretty much all the newly discovered phenomena in Physics were first considered to be ‘experimental errors’. I want to stop here for a moment, because it’s important. If a data point spoils the ‘good agreement with the experiment’ of the analytic curve that is based on formula that you pulled straight out of your ass it can mean two things: it can be an error in your experiment… say a cockroach moved with his growing family from the power supply of your oscilloscope to the amplifier plug-in of it; or it can be a sheer luck of bumping into something that you don’t know! That’s why physicists despize people who drop the ‘weird’ data points and, because of that only, Physics had gone as far as it did when it was still based on experimental data.

What do we force our machines to do? - We deceive them by not showing them the real (complete) data, we ‘massage’ the data until the ‘fitting’ gives a desired result by excluding and including (called ‘bootstrping’, invented by Wald) data points and calling the point that has been excluded an ‘outlier’. No, this point is not an ‘outlier’, you are a liar! So, basically we force our machines to process an incomplete or outright wrong data, we lie to them about the World. Well, most of my teachers did that (intentionally) too, but nonetheless.

    What a mess, people! You do this and expect some kind of a ‘generalization’? Really? The whole ‘Machine-Learning’ thing is teaching our machines how to ‘Machine-Cheat’ not how to generalize or, God forbid, ‘understand’ in any way.
    Why is this happening? Because we tend to shift the tasks that we are prohibited from doing ourself: cheating, ‘massaging the data’ in any and every inappropriate way, drawing unjustifiably wrong graphs, even race-gender-or-anything-else discriminating in cases when it is legally prohibited… and all this goes to our machines and they will be ‘to blame’. Nice. I can understand why, I perfectly understand all the ‘incentives’ and ‘motivations’, but this is just not right. This will not work like this for long, believe me. Sooner or later the lies and the liars will be exposed, kicked out and punished with oblivion.

What should be done now?

    Enough of this critique of impure unreasonableness! Let’s talk about the constructive program and where it should start and how it will begin, now.
    If you noticed all the memories of the learning process that we were talking about reside at a ‘meta’ level, we are not talking about particular problems we are talking about the problems in general and we are using a meta-language that has the following words:

  • problem - the singled out set of circumstances that can be discussed separately (from the complete context of the Universe :) );
  • understanding of the problem - the ability to mentally grasp all the important aspects of the problem and express/communicate the meaning of these aspects in one or several different ways;
  • deliberation - the logical argument about the ideas related to the problem or any of its parts;
  • solution - the logical conclusion of the deliberations about the problem and the complete chain of thought leading from the given conditions to it.

    As you can see these words are not for the discussion about the problem itself, they are for discussion about the process of dealing with the problem in general. Also, notice that these words describe the ‘outside boundary’ of the process, - the ‘interface’ with other events, but not the inner work of reason that is being done while the problem is being solved. Maybe that is the missing key for the next door in our labyrinth.
    Basically, what we should be doing for our machines is: enourage the meta-learning, the learning of the methods of aquisition of principles (or ‘generalizations’ if you like this word more)… and these three “of” in a row in this phrase which I put there intentionally symbolise just that - the “meta” level of the task. And it is not that far from what the great teacher of the past - Polya suggested as ‘heuristics’ of solving mathematical problems.

1. UNDERSTAND THE PROBLEM
• First. You have to understand the problem.
• What is the unknown? What are the data? What is the condition?
• Is it possible to satisfy the condition? Is the condition sufficient to
  determine the unknown? Or is it insufficient? Or redundant? Or contra-
  dictory?
• Draw a figure. Introduce suitable notation.
• Separate the various parts of the condition. Can you write them down?

2. DEVISING A PLAN
• Second. Find the connection between the data and the unknown. You
  may be obliged to consider auxiliary problems if an immediate connection
  cannot be found. You should obtain eventually a plan of the solution.
• Have you seen it before? Or have you seen the same problem in a slightly
  different form?
• Do you know a related problem? Do you know a theorem that could be
  useful?
• Look at the unknown! Try to think of a familiar problem having the same
  or a similar unknown.
• Here is a problem related to yours and solved before. Could you use it?
  Could you use its result? Could you use its method? Should you introduce
  some auxiliary element in order to make its use possible?
• Could you restate the problem? Could you restate it still differently? Go
  back to definitions.
• If you cannot solve the proposed problem, try to solve first some related
  problem. Could you imagine a more accessible related problem? A more
  general problem? A more special problem? An analogous problem? Could
  you solve a part of the problem? Keep only a part of the condition, drop
  the other part; how far is the unknown then determined, how can it vary?
  Could you derive something useful from the data? Could you think of
  other data appropriate to determine the unknown? Could you change the
  unknown or data, or both if necessary, so that the new unknown and the
  new data are nearer to each other?
• Did you use all the data? Did you use the whole condition? Have you
  taken into account all essential notions involved in the problem?
  
3. CARRYING OUT THE PLAN
• Third. Carry out your plan.
• Carrying out your plan of the solution, check each step. Can you see clearly
  that the step is correct? Can you prove that it is correct?
  
4. LOOKING BACK
• Fourth. Examine the solution obtained.
• Can you check the result? Can you check the argument?
• Can you derive the solution differently? Can you see it at a glance?
• Can you use the result, or the method, for some other problem?

    As you can see the description of these steps of the process of solving a problem require a whole bunch of other meta-language words. From another book of Polya.

First Principle: Understand the problem.
• Do you understand all the words used in stating the problem?
• What are you asked to find or show?
• Can you restate the problem in your own words?
• Can you think of a picture or diagram that might help you understand
  the problem?
• Is there enough information to enable you to find a solution?

Second Principle: Devise a plan
Polya mentions that there are many reasonable ways to solve problems.
The skill at choosing an appropriate strategy is best learned by
solving many problems. You will find choosing a strategy increasingly easy.
A partial list of strategies is:

* Guess and check
* Make an orderly list
* Eliminate possibilities
* Use symmetry
* Consider special cases
* Use direct reasoning
* Solve an equation

* Look for a pattern
* Draw a picture
* Solve a simpler problem
* Use a model
* Work backwards
* Use a formula
* Be ingenious

Third Principle: Carry out the plan
Persist with the plan that you have chosen. If it continues not to work
discard it and choose another. Don’t be misled, this is how mathematics
is done.

Fourth Principle: Look back
Polya mentions that much can be gained by taking the time to reflect and
look back at what you have done, what worked, and what didn’t. Doing
this will enable you to predict what strategy to use to solve future
problems.
 
How To Solve It, by George Polya, 2nd ed., Princeton University Press, 1957

I will summarise these principles a little bit later and in a different place. But as Polya, Kleene and many other people said, we should call this discipline ‘heuristics’, - I agree.
    In any case, what I’m saying is: let’s devise a meta-language for description of the techniques of ‘deliberations’ as in this concise Polya instruction and keep describing all the known cases of such deliberations in that language until we find out how it can be and should be done. Otherwise we are screwed.

I created a couple of virtual organizations for this topic… as always :)…
machine-teaching and
teaching-machine-to-learn.

P.S. After writing all this I almost immediately, the next day, to be precise, bumped into the github organization of Deep Mind and sure enough there are some repositories exactly in this vein there, namely:
Learning to Learn;
Abstract Reasoning Marices.

Later.