I'm Alex, the CEO and founder of Scale. We label data for AI companies. I'm here with Christian Szegedy, currently a research scientist at Google Research. He's worked on a number of influential results in his research career and so I'm really excited for this conversation. A couple of them, so he published the first state-of-the-art use of deep neural networks for object detection in images in 2013, then in 2014 he published the first paper on adversarial examples, which is obviously now a hot research topic.
He also designed the Inception architecture, which is one of the most popular architectures for object detection in images, and he invented batch norm, which really introduced the concept of normalization in deep learning, and is now used in most modern neural network architectures. And now he's one of the few deep learning researchers working on formal reasoning. Welcome.
So, I wanted to start actually just by asking, when you were working on perception, I think starting six, seven years ago, why were you working on it? Why did you think it was interesting or important research?
So, when I joined Google in 2010, so AI was not really a popular topic, or most people looked at very skeptical eyes, and my purpose in joining Google is to learn machine learning and AI. Actually, I was not so super much into perception per se. I was much more excited about learning machine learning in general, because my goal always was to design systems that are artificially intelligent. So, actually, reasoning was my original motivation, to learn machine learning, but at that time, vision was one of the most obvious outlets.
And I had the luck that I ... I managed to get into a group that ... who were ... who did research on computer vision.
And I had the luck that I ... I managed to get into a group that ... who were ... who did research on computer vision.
Why did you believe in machine at that point? Because I think the results weren't that compelling at that point, that you could really believe that machines would be able to do all these things that humans do very well, so what was the core of that belief?
I've believed in AI and machine learning for decades. I always wanted to work in this area, it's just that when I did my study that was not very popular, and it was hard to get a job in that domain, but I always thought that machines will eventually learn to learn just as well as humans, we just don't know the right techniques.
So, I was surprised how little new ideas was necessary, so actually most of the ideas we use are from the 70s and 80s. So, it's hardly anything new.
But you, you sort of ... you just had this, like, personal conviction that, "Hey, machines should be able to learn as well as humans"?
Yes, my guess was that biology gas systems can do learning, and they actually ... So, learning is ubiquitous, so it shouldn't be something that is ... that requires a lot of really hard engineering or some really big jumps or ideas, because, I mean, biology figured it out so we should be able to somehow get there. I didn't know it was so simple though, so I was ... I thought it was much more complicated than what we ended up with, yeah.
Right. You expected it would be much harder to get good at these problems?
And now, what did you expect was sort of the importance of your work in perception when you were working on it? It sounds like you sort of fell into it, but what did you think was going to be the impact of that work?
Yeah, so I didn't know how, so what is the timeline to get to usable results. So, the fact that we got usable results in two years from, let's say, 2012 to 2014 was a surprise to me, but I was not really surprised that it happened within that, say ... I would have bet that it happens within a decade or five years, but I wouldn't have expected two years for vision to improve so much. When I started computer vision I didn't even know how far it was, so I was surprised how poor it was before I started working on it, so ...
Yeah, so it was sort of ... it went through this massive progress over just a few years.
My other thing was that I had zero background in computer vision, so my only bet was to do produce good results is to do something that nobody else did. I mean, there are a few people did, like Alex Krizhevsky's groundbreaking work, so that were parallel to when I worked on it, but ... So, most computer vision researcher I worked with at Google were absolutely super skeptical about neural networks in general in 2012, or in '11, so there was hardly any traction. So, it was nice for me because with Dumitru Erhan, he could, like ... two of us could like do a lot of things quickly before most people jumped on it.
It's one of these interesting things where most people who had been working on it for a long time didn't ... was very skeptical, didn't expect the methods to work, and so it's because you ... it was like a beginner's mind in some sense, or a beginner's approach.
Yes, yes. I thought that that's my only chance, so I just ... We do it hundred percent, because if it doesn't work out then I have ... if I have no chance to catch up with all the other people and ... so ...
Yeah, yeah. Got it. Yeah, so it was almost like, "Okay, deep learning has to work."
No, I mean it doesn't have to. I mean, I could have survived without it, I just said, "Okay, I bet on it, because I have nothing to lose."
The title of your paper on adversarial examples was Intriguing Properties of Neural Networks. I mean, it was almost like you had discovered this curiosity, and it wasn't really framed in a ... in the context that they are now. Right now it's like safety is the primary context in which people talk about them.
Yeah, so actually it was ... It's a stupid story because I had these adversarial examples lying in my drawer for more than a year, or almost two years. I discovered them in 2011, but then Wojciech came to me and wanted to write a paper with all kinds of ... So, I was too lazy to publish it and then Wojciech said, "Okay, you have this thing and we can combine with other stuff and then publish a joint paper with various intriguing properties." And as people started to bail out and they didn't put their own stuff because it was, like, not interesting enough or whatever, and then the paper mostly was about adversarial examples.
But if I would have known it beforehand, I would have just wrote a paper, like, with Wojciech alone, or maybe completely alone, and then I'd have ... like, just with the title of Adversarial Examples. So, actually, we planned with my manager to write a paper with a title like [inaudible 00:06:41] Blind Spots in Neural Networks a year earlier just on that topic, but we just ... I just was too lazy to do it.
Interesting. I see, so originally there were other intriguing properties that you wanted to talk about but then everybody else dropped out, and it was just about adversarial examples.
Yeah, so basically, the paper, there was another intriguing property in the first section, and then there was the adversarial examples, but nobody cares about the first intriguing property, that was kind of like ... It was not so super intriguing, it ... yeah.
Right, and so, that's interesting, like you had discovered these adversarial examples two years before you published the paper, and you just ... it was ... you didn't think they were important enough to publish.
Like, yeah, I thought it should be published but I didn't ... I just procrastinated on doing it. I always wanted to run a bit more experiments and wanted to enlist somebody to do an image, not experiment for me, or ... I had experiments with various datasets, but not on ImageNet, and I thought that was important. And then it [inaudible 00:07:43], so Wojciech volunteered to do that for ImageNet, and then somehow the paper grew and it ... like ...
It was not planned this way. So, that's how it happened but it doesn't really ... I mean, main thing is that [inaudible 00:07:56]. But I think if I wouldn't have published it then somebody else would have came up with the same idea. I mean, several people came up with similar ideas within the same one year period, like, from 2014 to '15, so ... So, I had it in 2011, but I was, like, kind of ... yeah. Unfortunately only had like 10, 15 people in [inaudible 00:08:17].
Yeah. Yeah, well, I'm ... now it's a pretty hot topic, generally framed, again, in the context of safety, like how can we deploy these systems to ... deploy them to the real world if adversarial examples exist? Why did you think at the time that they were important, adversarial examples?
It was obvious to me that if deep learning takes off, you can use adversarial examples for all kinds of attacks, so I ... So that was one of my motivating talk in 2012, or, if I'm saying that [inaudible 00:08:53], to say, okay, for example, spam filters could be circumvented, and things like that. So, I thought a lot about these implications of this phenomenon, and so, and then people got aware, actually Jeff Hinter was in one of those early talks on adversarial examples a year before this was published and he was very shocked. He said, "If that's true then we have to do something about it." Yeah, I mean, there were obvious implications, practical implications.
Yeah. Based on the state of research now, I mean I think there's a lot of incremental work to make neural networks robust, these adversarial examples. I'm curious, how has your optimism, or bullishness, on deep learning changed over time? It sounds like initially you were skeptical, then you started working on it, it started working, and that was exciting, and now deep learning is sort of the state of the art for a large number of problems. How has your optimism with respect to deep learning changed over time?
My changed very, very quickly, so I mean, I was in very early. Google bring me things, like, then it was, like, a few people, and I saw immediately that, like, super clever people, like Jeff Dean, or Andrew Ng, or other. They're thinking about this and what was their thinking process? So, I got convinced within a few months that that's a good idea, and actually I ... So, if people ask me about whether it's a [inaudible 00:10:19], I would ... I always say that even if, you know, like, all the research would stop right now, and nobody would do anything new in deep learning, and they would just take all the technology that exists and try to exploit it as maximum extent, and see there there would be like 10 years of ... really cool advances in technology, just based on the current state of machine learning and AI. But most people don't really get it, I think, so that they don't see the potential in that. But if you look at it, the research is accelerating, so therefore I'm extremely bullish, but I would say that deep learning itself is a poor notion because it was designed ... So, it was coined-notion because it was designed so it was, coined when it was referring to a few layers that were not one layer but two or three.
And it's like okay we do deep because we are not one layer.
And I think it's like developed into something that you have complicated programs that are parametrized by all kinds of matrices or tensors and you're learning those tensors. And so it became a much more generic notion than it used to be.
And I think this tendency will go on. So right now we are not designing the network architectures anymore, we let some machine learning system design them. So therefore I think deep learning has a very short time frame future. It will be superseded by general programs into this.
So that's why I think program synthesis is the future it's not deep learning.
Deep learning is sort of the dumbest way to do metaprogramming.
Yeah. And sort of meta programming and program synthesis is really the future.
Yes so as we generalize deep learning and we improve program synthesis they will merge. And basically machine learning is about creating a program that solves a task automatically.
So currently if you just use gradient descent to do this then you can solve certain tasks with certain engineering. But if you have more sophisticated software synthesis methods, then you can solve some much more general task automatically than now. So basically we are moving into the theory that everything be synthesized by machine. So we have this more higher and higher levels of automated feedback loops.
Right. I think it's sort of all put where it's like even with what we have today, if we just exploited that and implemented it ... in a bunch of different areas, that would already be an astronomical amount of impact.
But then research is improving as well and so you're layering, you can layer two curves on top of each other and say, "Hey, who knows where it's gonna go?" Right?
Yes. So I think it's similar to how software has eaten the word from let's say 1990 to 2010, I think is eating software now.
From let's say 2015 to the next 20 years at least. I don't know what comes after that, but I think it's like currently it's like 1990. So if you would have bet on computers and software in 1990 then you would have been as right as now if you bet on AI.
Right. Do you think that state-of-the-art perception systems have gotten to a point where perception is no longer the bottleneck for any robotics problem?
I think it's not 100 percent there but close. So at least there is clearly a light at the end of the tunnel. That was not the case in 2010 or 2012 even.
But now I would say most people think perception might not be perfect, but there is a clear improvement path that we are on and it will continue until we are there.
You used to work on perception and now you work on formal reasoning and as you mentioned before formal reasoning was actually your primary goal. And so now you're working on, for example, your most recent paper was about, Theorem Proving. So actually proving mathematical theorems using deep learning. Why are you working on formal reasoning?
The way that we interact with computers has been essentially the same way that was figured out in 1950's. So essentially we are all using a glorified Fortran compilers.
Actually or even the complexity of programming didn't went down if anything it went up. And I think now it's newer techniques and technologies, it slowly starts to become possible that computers start to adapt to humans and not the humans have to adapt to computers. So the past 30-40 years was about people getting used to computers and start to use them and training a lot of software engineers.
But I think it can be fundamentally changed by changing the way we interact with computers so that they can understand fuzzy reasoning, they can understand intention, they can understand a lot of things that humans take for granted. Then we will be able to interact with computers much more naturally and then productivity could go up a lot.
And I think in order to start with that you could start with some practical task. That is has a [inaudible 00:16:02] root but I thought that, "What is the simplest possible thing that you can make sure that your system understands fuzzy thought processes and it can turn them into formalized processes." And I think mathematics is the cleanest example of that because it doesn't rely on any domain level.
So you don't need any real world knowledge or anything extra. Everything is there. It's in the axiom to say there are only a few of them.
So the question is, "Can we create a system that allows you to interact about mathematics just the way that you interact with a human?" But it can interact the same level as a really good human mathematician and also understand you in natural language. So that would be a first step towards systems that could revolutionize computer science or how do we interact with computers? Because once you have the first steps it's like a superhuman mathematician.
Then you can infuse more and more domain knowledge and then you can do software synthesis with it. That's why I think this is the future because I think mathematics is just one step for me to software synthesis.
And software synthesis is the unification of machine learning and programming.
So yeah therefore I think is like the logical next step to do mathematics. But for some people it's completely outrageous. Most people who work on formal reasoning think it's- they think it's a ridiculous dream. It will never happen.
But I saw that even in 2011 a lot of people in computer vision saw that deep learning stuff is a ridiculous dream, it will never happen.
And it happened in two years. So, I think there is a significant chance with this pattern of recognition and perception capabilities we will be able to get to the point where we can get mathematical capabilities and communication capabilities similar so that at least useful for humans.
So basically my point is that, for example, if you had a software that could read mathematical literature in human form, that would be a very strong indication that we can be really read fuzzy reasoning that is given.
So imagine that you have an employee that you want him to program something and then you give him a task and say, "Do this." You don't have to describe all the steps, because then you don't need them you just write the program by yourself.
So similarly with a computer, imagine that you have an artificial intern or artificial employee that you just tell the same things that you would tell with your software engineer and then it would program it for you. And then come back and then you say, "I wanted something slightly different." And then it iterates a bit but at the end of the day you get something useful. And I think that there like even if you take the best programmer there is like a potential there that is under [inaudible 00:19:12] that you can accelerate software engineering if you would have that kind of capabilities.
So that's what I think is the possibilities that you could make [inaudible 00:19:21] software data without knowing any programming.
Right. This sort theorem proving is the first step to full program synthesis.
Yeah basically to understand you without fully specifying everything.
Right. Right this fuzzy commands that humans give to the world.
I myself having spent a lot of time doing math, I wonder like why do you think it's actually a trackable problem?
Some of the similar reasoning programs like computer games, like chess and board it looks like they have a remarkable simple solutions. So most of them boils down to perception that we understand pretty well. And you can solve them. There is a lot of complicating factors in maths and one of the most complicating factor is that [inaudible 00:20:06] is not an option. So you cannot really be slightly better than your opponent. You have to be either prove something or not.
So there is like a clear filler. In chess you can be like, beat it 10 percent of the time. So you can do [inaudible 00:20:20] and then you very slowly pull yourself up.
So that's not an option for maths. So then that's why we working on it. So that's why we think it's an exciting problem because it is different and difficult. On the other hand the thing that perception is a very strong tool that allows us to do reasoning better than everything before. So almost all the tools that exist I think, could be obsolete in just like, obsolete in most of the computer vision before. So most of the [featured 00:20:52] generation before.
At least. So there is a great potential and we see that there were no real [inaudible 00:20:59] that were actually untractable for deep learning because we're given enough data. The tricky question, "How do we generate enough data to have the initial system that can reason at a decent pace so that it can pull itself up?"
In some sense your belief is ... we haven't found a problem that's too big for ... that's too difficult for deep learning yet. You just need enough data and so it's, "How are you going to collect enough data?" I mean what's your strategy for collecting enough data for more reasoning?
We rely on certain existing format [corpus's 00:21:38] but these are relatively small so they typically they will develop to prove one or one big mathematical theorem [inaudible 00:21:46] or finite simple groups, stuff like that. So they are like three-four of these big corpus's but these are not big enough to pull yourself up. So the really crazy idea here is to read all of the human mathematic
[inaudible 00:22:00] idea here is to read all of the human mathematics literature, and then learn to formalize them. So, you want to use that initial small set of data as kind of just the spark to initiate the feedback loop in which you will learn to read human language mathematics. And turn them into conjectures, and then prove those conjectures and then work you way up. So, that's our strategy and you have a lack of natural language mathematics. The good side effects of it is that at the same time the system will learn to understand natural language. So, if we get there then we don't just get a very good mathematic system, we have a really strong system that can... that demonstrates strong natural language understanding. Which would be in itself a mutual goal to get there, but I think addressing both of them have the same time has a higher chance than addressing them in separation.
That's really interesting actually, you think that understanding formal reasoning from just math textbooks and not papers literally, doing that has a higher likelihood of succeeding even though strong natural language understanding is a prerequisite which we're not quite close to?
Okay. So, normally there are no alternatives to do long strong reasoning. So, either you collect a lot of training data. But even [inaudible 00:23:32] wouldn't be able to collect the data for us because you don't have access to mathematicians like in large scale. So, it's very hard to collect training data for formalized mathematics.
So, that's not a realistic path. So, if you want to create a superhuman mathematician then the only alternative would be to do open ended exploration of mathematics, and figure out what are the underlying principles of interesting mathematics, and learn how to discover that type of mathematics. Even that wouldn't work to well, because even if your system would learn to argue higher level mathematics you will not be able to tell a concrete problem to it because you don't speak the same language as the system. So, you will have to formalize your statements in the system's basic terminology, which would be like self developed. So, you will have to understand it just like an alien artifact. So, it would be very hard to work with such a system. So, therefore I think that a self exploring mathematician system, you might learn very good, so become very good reasoning engine, but you can never tell whether it is one or not.
So, your only chance is to learn to communicate at the same time as to reason. And I think if you want to learn natural language understanding so a lot of people make the mistake I think that they treat language as an object. So, you try to learn natural language understanding as part of manipulating the natural language. But natural language in general are not about language itself, they are just communication mediums of something else. So, I think natural language understanding is not really natural language understanding, it's communicating about your understanding of something. So, it's really like understanding of mathematics, communicated via natural language, or understanding of the word communicated via natural language. So, basically natural language is a compressed communication channel.
That's really hard to do with if don't have something to communicate about. You don't have a [inaudible 00:25:39] controlled environment to communicate about. But mathematics is a really high [inaudible 00:25:44] complicated environment about which you can communicate. So, I think that is the perfect medium to do via natural language understanding as a prototype.
And ones you have mathematics you could extend that system feed all kinds of domain knowledge, because it has all the logics. So, you can argue about anything, and you have the logic foundations to do that. So, I think the effort is natural language understanding alone is the hard as together in mathematics and doing mathematics alone is the hard as than with natural language understanding.
Right. Natural language understanding alone you don't have a sort of communicational vacuum
Nothing to communicate about. And so now you have a back bone of formal like formal reasoning or formal segments like
That's basically a sound box. So, your mathematics is a sound box that you can manipulate in the memory and it take like a small word about which you can can communicate.
Right. Your is extremely ambitious, these are ambitious things you achieve, why do you think that they're actually possible.
Sorry, what I did mention that I think so for example [inaudible 00:26:55] and [inaudible 00:26:55] all these methods show that reasoning is possible. It's strong perception and relatively strong perception is possible with neural network.
So, therefore I think we can do a lot of things that human can do with deep learning relatively high certainty.
So, that's one of the arguments. But I agree it's still not guarantee even with this supporting evidence. Most people even [inaudible 00:27:28] came out so that go refine that level of human intuition that is not possible with computers, or not now maybe within 10 years or something. And then [inaudible 00:27:41] came up. Then that okay you just [inaudible 00:27:44]. Then add it to existing [inaudible 00:27:48] and you get pretty good. And then you just go a bit further and you get even super human. So, I think that component that previously look like poly human, that ability to this fuzzy intuition, fuzzy reasoning is kind of artistic spirit that look like uniquely human. And most people think it was not possible for computers. But we see that actually this intuition part is very like a deep learning solved that intuition.
So, basically we have not an artificially intuition module that's called deep learning. So, we can infuse that into a lot of domains if you want so.
Your most recent paper that the approach is essentially, it's an higher level uses a [inaudible 00:28:39] with a neural network a sort of [inaudible 00:28:43] engine and switches I think it switches between [inaudible 00:28:45] and the sort of classical methods. What do you think that's sort of a limitation of that approach?
Yes. So, that's a very limited approach actually, because so you want to give it a cage so it have to be inside a certain search method. So, your deep learning algorithm is really has the freedom to explore the first pace as it wants. So, our next paper is that's coming out [inaudible 00:29:10] soon, like in a few weeks. Turns it around and what we say is we have like a search environment in which the network has all kinds of actions to perform. And then the network becomes the outside and the search environment becomes just an environment on which it operates. I think that's much better approach, but even that has probably limitations. So, we thinking about how to transcend most of the limitations to give maximum freedom to the network so that it can be maximumly free to feel in all the space that are possible, basically.
Yeah. So, I agree with you, it was an important first check. We wanted to just whether neural networks had something like existing search. And we were like kind of success but it was not really... it was not that the escape velocity with it, it cannot be open and then we improve that one.
What would you say are the other exciting or parencially underrated areas of research in AI right now?
I think that goes back to another one of your questions, what should we do about AI and being misused?
So, a lot of people do lip service and say, "Yeah we do this and that." But I think it's, so how do you combat certain negative effects of machine learning. And what are those negative effects, because a lot of them are kind of invisible. So, how do people make decisions about our lives, so basically like type. For example, insurance companies, agencies and stuff like that. So, all this and this is just a small thing I don't really know everything, so as AI gets applied more and more then all these biases that go into the AI systems will affect everybody more and more. And I think that's something one should do much more research and take it much more seriously.
And I'm happy that Google is taking a lead on that actually, so a lot of people notice that and that is a significant push. But I think it's kind of like not really wanted on the society level, to do this kind of research. Because there is no obvious immediate monetary impact, or positive impact of that. I think that's an important thing one should do, one should research while we are researching AI technologies.
Thanks so much for being here Christian. It sounds like to sum it all up you're definitely an AI optimist and formal reasoning is on the critical path for a strong AI.
Thank you Alex.