Geoffrey Everest Hinton(born 6 December 1947) is an English Canadian cognitive psychologist and computer scientist, most noted for his work on artificial neural networks. Since 2013 he divides his time working for Google (Google Brain) and the University of Toronto.
- I got Christianity at school and Stalinism at home. I think that was a very good preparation for being a scientist because I got used to the idea that at least half the people are completely wrong.
- On his childhood.
- We got quite a few applications, and one of these applications I couldn't decide if the guy was a total flake or not... He wrote a spiel about the machine code of the brain and how it was stochastic, and so the brain had this stochastic machine code. It looked like rubbish to me, but the guy obviously had some decent publications and was in a serious place, so I didn't know what to make of him... David Marr said, "Oh yes, I've met him." I said, "So what did you think of him?" David Marr said, 'Well, he was a bit weird, but he was definitely, smart." So I thought, OK, so we'll invite him. That guy was Terry Sejnowski, of course... the book was one of the first books to come out about neural networks for a long time. It was the beginning of the end of the drought... both Dave Rumelhart and Terry said that from their point of view, just getting all these people interested and in the same room was a real legitimizing breakthrough.
- On the 1979 conference, the proceedings of which were published in Hinton, G. & Anderson, J. (eds) Parallel Models of Associative Memory (N. J. Erlbaum, 1981).
- Then we got very excited because now there was this very simple local-learning rule. On paper it looked just great. I mean, you could take this great big network, and you could train up all the weights to do just the right thing, just with a simple local learning rule. It felt like we'd solved the problem . That must be how the brain works. I guess if it hadn't been for computer simulations, I'd still believe that, but the problem was the noise. It was just a very very slow learning rule. It got swamped by the noise because in the learning rule you take the difference between two noisy variables--two sampled correlations, both of which have sampling noise. The noise in the difference is terrible. I still think that's the nicest piece of theory I'll ever do. It worked out like a question in an exam where you put it all together and a beautiful answer pops out.
- On the Boltzmann machine training algorithm.
- I first of all explained to him why it wouldn't work, based on an argument in Rosenblatt's book, which showed that essentially it was an algorithm that couldn't break symmetry... The next argument I gave him was that it would get stuck in local minima... We programmed a backpropagation net, and we tried to get this fast relearning. It didn't give fast relearning, so I made one of these crazy inferences that people make--which was, that backpropagation is not very interesting... [One year of trying and failing to scale up Boltzmann machines later] "Well, maybe, why don't I just program up that old idea of Rumelhart's, and see how well that works on some of the problems we've been trying?"... We had all the arguments: It's assuming that neurons can send real numbers to each other; of course they can only send bits to each other ; you have to have stochastic binary neurons; these real-valued neurons are totally unrealistic. It's ridiculous." So they just refused to work on it, not even to write a program, so I had to do it myself.
- The reason hidden units in neural nets are called hidden units is that Peter Brown told me about hidden Markov models. I decided "hidden" was a good name for those extra units, so that's where the name "hidden" comes from.
- On the naming of the phrase "hidden neurons".
- I'm much more interested in how the brain does it. I'm only interested in applications just to prove that this is interesting stuff to keep the funding flowing. To do an application really well, you have to put your whole heart into it; you need to spend a year immersing yourself in what the application' s all about. I guess I've never really been prepared to do that.
- On practical applications.
- ... as soon as I got backpropagation working, I realized--because of what we'd been doing with Boltzmann machines--that you could use autoencoders to do unsupervised learning. You just get the output layer to reproduce the input layer, and then you don't need a separate teaching signal. Then the hidden units are representing some code for the input.
- On autoencoders.
- In late 1985, I actually had a deal with Dave Rumelhart that I would write a short paper about backpropagation, which was his idea, and he would write a short paper about autoencoders, which was my idea. It was always better to have someone who didn't come up with the idea write the paper because he could say more dearly what was important. So I wrote the short paper about backpropagation, which was the Nature paper that came out in 1986, but Dave still hasn't written the short paper about autoencoders. I'm still waiting.
- On the publication of Learning representations by back-propagating errors (1986), which popularized backpropagation in neural network research.