Jeremy Howard

Deep learning researcher & educator
Jeremy Howard
Alex Wang

Hi, I'm Alex CEO and founder of Scale. I'm here with Jeremy Howard, who's currently the cofounder of Fast AI, a research institute dedicated to making deep learning more accessible. Previously he was the CEO of Analytic, the first startup applying deep learning to medicine. And before that was the president and top performer at Kaggle. Welcome.

Jeremy Howard

Thank you.

Alex Wang

Cool. So I want to actually start just by asking, how did you first start messing around with deep learning?

Jeremy Howard

Ah, well I guess when I started with neuro networks, it wasn't deep. It was shallow. So that would have been 25 years ago. I'm trying to think. There was a couple of things that happened roughly simultaneously. One was a, there was a, there was a kind of computing journal back then called Dr. Dobbs and magazine that you could buy that. And they had an article about these things called neural nets. And so I tried implementing one and then I was working in a management consulting and we were working for a retail banking client, in marketing and I was working on specifically on trying to improve their targeted marketing results. So I thought, yeah, this'd be an interesting thing to try neural nets with. So I did a little bit with neural nets back then. They worked okay, but decision trees generally were faster and easier.

But then by the time random forests came along, that blew me away, you know? And I kind of put a neural nets aside, until I guess 2010 ish, 2010, 2011 and we started seeing Kaggle, neural nets starting to appear in Kaggle competitions and particularly in academic competitions, not on Kaggle. And I kinda thought, oh that's interesting. Like I don't, I always felt like neural nets was the right way to solve the predictive modeling problem. And I definitely felt like we needed to do it on GPUs, but GPUs were so hard to program. Every time I tried to program neural nets on GPUs in the early 2000s I just felt I wasn't smart enough.

So I guess between like [Cruder 00:02:25] coming along to make GPU programming easier and then Alex Kajeski showing how to do that at scale with the image net 2012 challenge. You know by that point it was... And before that Sarah Sun showing on the a German traffic sign recognition competition in 2011 it superhuman performance. It was just really obvious by then that deeper neural networks were going to take over in every place where they are kind of feature engineering required was beyond what humans are likely to be able do themselves.

So that was kind of the point I thought I should dedicate my time to learning about what's happened in neuro nets and the previous couple of 20 years and catching up and seeing where they're going to go next.

Alex Wang

So 25 years ago when you sort of started working on neural nets, it was because that philosophically it seemed like the right approach.

Jeremy Howard

Yeah. Or at least it was an approach to solving a problem that was bothering me, which was feature engineering.

Like with linear models, which is what I was spending most of my time using. Just like, what interactions to include and what nonlinearities to add. And I felt like these were not things that I should have to do manually.

You know, it just didn't seem like a human could possibly get that optimal. So trees and neural nets as now. Back then it seemed like the two potential approaches to fixing that problem. So I was, yeah, so I was very interested in, looking at both of those, although to be honest, I still spend a lot of my time on more like large automated creation of features, you know, automatic interactions and automatic nonlinearity finding and then some kind of a regularization approach. Which is still a pretty good way to, certain kinds of models or at least to do certain kinds of models that have constraints on the functional form or run time performance or whatever.

Alex Wang

So, so you mentioned that by the time Alex Net came along, it was sort of obvious that, that neural networks with more and more layers, we're sort of going to be the way to go. And you actually started in Lyduck just in 2014 which was just two years after Alex Net.

It was the first startup applying deep learning to medicine. How did you know that it was that deep learning was going to help?

Jeremy Howard

Right? So I spent a year figuring that out. So I spent a year, just doing research into the question of, okay, I'm pretty sure I'm totally convinced deep learning is going to be used like everywhere. Where's the best place to apply it right now? And so specifically I was looking for places that could have the largest societal impact. Using the technology as it was available at that time. So very quickly realized that it really had to be in vision applications because that was where it totally proven itself at that time. Because like for a startup you don't want to do like new research to find out whether something's possible.

So then for that I was focused on looking at satellite imagery, medicine and robotics, because they were the three areas that were kind of unsolved problems that heavily leveraged vision.

And the thing that convinced me was discovering that in the developing world, which is most of the world's population, there's about one 10th the number of doctors we need. It'll take 300 years to train enough doctors to meet that gap. And so I thought, wow, people are dying unnecessarily because they can't get diagnosis or treatment. And I mean, I didn't know anything about medicine. I still don't. But I mean it was pretty obvious that some large amount of medicine is looking at data to figure out what's gone wrong and what to do about it. And obviously it also turns out a lot of that data is in image form. So then I went to MICCAI, which is the biggest medical image computing conference in the world, and with shocked to discover that literally nobody was talking about deep planning. So it had been totally missed by the entire medical imaging community.

Then I went to IRS today, which is, when I say big, I mean big. It's like I can't remember like a hundred thousand strong conference for radiologists every year and talk to everybody I could. Nobody in radiology had heard of deep learning. There was no interest in it there. So I thought, wow, this, you know, as I sat with radiologists, I watched them do their work. They spent all their work looking at hundreds and hundreds of pictures trying to find things that I was sure computers would be easier, would find it easier to find.

It's not the only thing radiologists do, but that that's a big piece of that job. And I thought, wow, I really should be able to help them. If we could maybe that can help feel this 10 X gap in one particular field at least. And then I thought if we could prove that that was possible, then maybe we could leverage that to show what's possible in other image areas like pathology or dermatology or whatever. And then we could go from there into looking at other areas like ECGs and build from there to gradually use this across this full spectrum of medical diagnostics and treatment planning. So that was my hope, was that basically we could show people what's possible and kind of kickoff interest, which is exactly what happened.

Alex Wang

I want to move on to another, I think a very sort of seminal thing in, in machine learning. I think. So, now most people who work on machine learning, know about Kaggle. I'm curious, how did, how did you actually hear about Kaggle back in the day?

Jeremy Howard

Well, it just started I guess, and I'd been, you know, running my own companies for 10 years and before that I was in business strategy. So I very much thought of myself as somebody who didn't know anything about machine learning but wanted to learn. I don't have any technical academic background at all. So I felt very much an outsider in that community and wanted, yeah, I felt it was like an important to be stronger at those skills. So that was back when I was in Melbourne, I went to a meetup for the [AH 00:09:21] language. To try to learn more about that because I'd been using S plus for a few years, which was proceeded our, at one of my startups and had dabbled in Ah, and I thought, oh, it'd be good to learn more about that. So I went to this meetup in somebody in our kind of saying like, I want to learn more about it. And somebody said you should try Kaggle competition. So that's how I found out about Kaggle.

And that was very intimidating because I looked on Kaggle and this competition that was running that seemed interesting on time series forecasting. I kind of looked at the people who were on top towards the top of the leaderboard and I'd look them up and they're like professors and PHD students and econometrics and it's just like, well, you know, I guess I should try anyway. But it's always feels scary putting in your submission when you know all these people are so much more qualified than you. And yeah, I was down towards the bottom of the leaderboard as you would expect, but I spent, you know, a bit of time on it every day and gradually improved. And as I kind of went along, I started noticing things that seemed weird about how econometrician generally tackled this time series forecasting problem. And so I tried doing things a bit differently and here and there and each time I tried something different, I often found it worked a little better than what people seem to normally be doing.

So by the end of it, I got to the top of the leaderboard and I won the competition. And that was like.

Alex Wang

That was your first competition?

Jeremy Howard

Yeah. Yeah. And that was like a, it changed my view of myself, you know, from being like, I'm a business person who occasionally dabbles with algorithm stuff, but I'm an outsider who, you know, needs to learn a lot to being like, oh I actually have a skill that I have accidentally developed over the past 25 years. Because like I realized like okay, I have been solving data analysis problems for 25 years, not through a totally different path. You know, one of like me writing my own code and figuring out stuff myself. And when I'd read a book it would be more like a programming book rather than a, I'd never written read it. Almost never read a paper. Not quite true. I certainly love the random first paper, but hardly any papers. So yeah, to kind of find like, oh there's actually something I'm, I have world-class skills at. And it wasn't at all something I thought I had world class skills at was. Changed my life.

Because I thought well I want to use that skill to do something useful.

Alex Wang

Yeah. Yeah. I mean what, what do you think, what do you think you understood about these problems that other people weren't seeing?

Jeremy Howard

I think a lot of it was just, I'm a very strong practical coder. And I knew that already. Because I, you know, both of my startups were very code heavy and I was hiring the best people I could find in the world, to spend a lot of time on coding with me. And I always found was quite a bit more productive than them at writing code.

So I kind of knew I was good at coding and that turned out to be much more important than I realized. I always assumed that the math was what was important cause that's what everybody seemed to talk about and papers would be full of it. And I am not at all good at math, but I kind of suddenly realized, oh it's the math is just one way of writing down what we're putting in code.

And actually if you can think directly in code, it's better and you can experiment and try things out. And so it was partly just being significantly better at coding than certainly the vast majority of people that work in that field. And then I think secondly was 25 years of practice of using data to actually solve problems. Where at the end of it there's like a customer, you know who I need to give this thing to and that needs to work. Or when we, my email company, I actually just stopped spam and actually identify denial of service attacks. And for our insurance clients I actually have to suggest prices that caused them to make more money than they made before.

Alex Wang

So you actually had to be good? You're feet to the fire?

Jeremy Howard

Well. Yeah, I had to be practical.

Alex Wang

I want to move on to Fast AI. So, fast site has this incredible tagline make, neural networks uncool again. Well, what was really the vision behind it?

Jeremy Howard

Well, it came out of my frustration with, my failures at Enlitic, which was basically I had this much bigger vision than radiology. It was really to transform how diagnosis and treatment planning was done throughout medicine. And it was clear I wasn't going to be able to achieve that. At Enlitic. I, as a startup, I couldn't get access to the data that we needed. I found places that had the data and really just came down to like, we're not going to share it with the company cause.

We don't want you profiting from our data. It also was totally incompatible with the incentives of both the investors and the staff. Who saw we had a product market fit, you know, oh radiology, you know, it's going great. Why do you want to go to something else? Like focus on the thing that's working great.

Because for the staff, they had their options that they wanted to see go up in value and the investors had put in money that they wanted to see go up in value. So, all the incentives for all the stakeholders was focused on the thing that's already working.

And I've seen this before, it's a, it's a tricky thing to work with. Like at Fast Mail, which was an email company I ran back before all this, I really wanted to like work on contacts, calendars, voiceover, Internet. Even though I'd started as an email company, I wanted to make it into a full spectrum of communications company and I just couldn't get the other stakeholders interested in those things. So I think this is always a challenge. You know, and I'm not saying it's, it's the right way or the wrong way but just for me, I'm always interested in like, what's next?

So, fast AI kind of sprung out of that. It was basically like, well, how are we gonna actually help achieve the transformative impact of deep learning? Because it can be used in at least as many different places as the Internet can be used. You know? And I, there's no, even though I knew back in the early days of the Internet, it was very obvious to me that it was going to be used everywhere. I didn't, I couldn't enumerate what those places were or how exactly it would help or whatever.

So I had the same problem with deep learning. I, you know, I don't know about how every industry works and what every problem they have is and what data they have to other people to solve these problems are the domain experts in the fields, including medicine, right? So I thought, all right, well how do we get, get it so that doctors in hospitals, you know, that have access to the data and know about the problems to solve, can solve it themselves with deep learning.

So this is where this idea of kind of making deep learning more accessible, came from. And the idea of kind of making it uncool came from this idea of like, well let's initially focus on coders because at at least for now, you still need to be able to code and most coders are doing very uncool things like creating line of business cred apps, you know, in a room full of other cred app coders in Bangalore or somewhere, you know, for Anderson Consulting and they're old writing, you know, our authentication, user login forms or whatever. Like those are the coders I wanted to reach people using visual basic or PHP or whatever. So I kind of very much had like that kind of Indian software development community in my head, as we created this.

So we were kind of like, okay, hey, let's, target the Windows users. Let's target the C sharp and Java coders. And also let's target all the domain experts, you know, the doctors and the lawyers and the logistics and operations engineers and whatever. So, yeah, so we decided, so we came up with kind of four way, four paths to doing this. One was education. So like, okay, what works pretty well right now and how do you use it? One was research, which was like, same thing, like well what does work right now and how do you use it?

And then the third was software development. So as we started, you know, doing the research and education, we quite often came across places where we thought, well this ought to work well, but no one's really tried it yet. Or somebody has tried it, but it's not packaged up in a way that's easy to use. So let's write software to make things easier.

And then the fourth was let's build a community where, as people start using our education and our software and our research, they can find other like minded people and help each other. You know, especially like radiologists who are studying deep learning or activists who are studying deep learning, you know. Environmental folks trying to do environmental audits, who are studying deep learning. How do they find each other and help each other solve their problems?

Well, also with like cheap geographic areas, people doing deep learning in Lagos, you know, all of these things can feel very isolated. Because none of the other radiologists at your hospital or none of the other coders at your software development shop in Lagos or whatever, are doing deep learning. So create a community to help these people find each other and help each other out.

Alex Wang

Yeah. Well it's been, it's been incredible to watch. I'm curious, what have been some of the cooler things that you've seen as a result of your work in Fast AI?

Jeremy Howard

Well, oh gosh, there's so many. One thing that I was pleased to find and slightly surprised to find was when I went to NeuRIPs, which is kind of the premier AI academic event of the year.

Lots and lots and lots of presenters came up to me and said, "thank you for Fast AI. It's how I got started."

And I was like, I'm thrilled to find that, you know, in such a short time, our students have already got to the point where they're doing work that's accepted at NeuRIPs, and in that case it was pretty much almost always people who had, you know, already had a PHD or top level practitioner in like astrophysics or you know, some kind of like highly numerous programming oriented discipline. And Fast AI was the way they kind of realized that those skills they had in, I don't know, the statistical renormalization and physics, if you look at them like this, it's kind of the same as what we do in deep learning or whatever. So that was, that's been, that's been cool. And so I do now find it, yeah. Most conferences just a lot of people come up and say, "Yup, I got, I got started through Fast AI."

Alex Wang

That's incredible. I don't think there's a, there's a comparable for like learning how to build on the Internet, for example. I don't think there's like a comparable way that all these people who are incredibly gifted learn how to build the killer Internet.

Jeremy Howard

Yeah. I mean, so that's been, that's been super great. And you know, but I think more so for me, the people I come across who say like every couple of days I get a message from somebody who literally says, you changed my life.

Alex Wang

Yeah, yeah. It's, it's like you're, there's all these people around the world who are having something similar to your moment when you discovered you were good at Kaggle, which was like, oh, I can actually initially, solve these problems.

Jeremy Howard

Yeah, I think that's right. Because I think a lot of people also have that, that self doubt that I had, you know, it's like, oh, we're not math PhDs. Don't understand all the Greek letters. Bad, it turns out, yeah, you know, this, this technique gives me this superpower. So I actually created a topic on our forum called share your work here. And I just said, Hey, if you've done something interesting, the stuff from the coolest share it here, and there's like over a thousand replies now, and the vast majority of them, ah, I can't believe this worked so well. I just tried this thing and here it is. So there's lots of examples of like literally state at the art results from people who were just trying something out.

Alex Wang

Would you recommend that these software teams search for specialized sort of machine learning talent? Or are they they sort of retrain and teach their existing engineers how to do more deep learning?

Jeremy Howard

That's a great question. Definitely lean towards the latter, which is retrain existing internal people. It's not to hard to learn to become an effective deep learning practitioner. So we have lots of examples, like lots and lots of examples of people doing that through our courses in two months. They have to be a pretty strong coder already. They have to be tenacious. It's luck. It's not some magic thing where you don't have to work. But if you, if you've got that, you can be an effective deep learning practitioner in a couple of months. Particularly if it's an an area where the kind of domain is somewhat well explored, like audio image, text, Tabula, collaborative filtering.

Something like genomics is going to be a lot tougher because it's not really sorted yet.

Alex Wang

I'm curious, what do you think are sort of the structural problems with how machine learning research is done today?

Jeremy Howard

Oh, all of them. Yeah, so we've already talked about one which is this successor focus on novelty. I think also just the deeper issue is that the academic and industry communities and deep learning are to closely tied together. Because it's kind of such a recently popular area that really came out of academia.

The vast majority of the people I see in hiring positions, are people with academic backgrounds. And that leads to a lot of problems. It means that the way they are filtering applicants is very heavily focused on like, the stuff that was used for filtering when they were going through the academic process themselves. So they look for candidates who are, you know, really good with the Greek letters and proofs and stuff. They tend to really under estimate the importance of code or not know how to actually test the code side of things. They tend not to really understand, kind of, business strategy so well understand how to test those things. Communication skills.

So you kind of often find these pockets of deep learning people ,in companies where they're so separate from the rest of the company. They have very little strategic impact.

On the companies they're in. And kind of all they do is publish papers and the only people that appreciate them are their academic peers. So I think that's a big, a big issue. It also is a real problem for diversity. And I mean, basically, I guess it is a division you problem. You know, you've got far too many people who come from a small number of particular schools with a particular way of thinking and just a lack of understanding of like, what are the actual problems we have to solve? And what are the actual constraints we have to deal with? And how do you actually create code that implements the things we need.?

So yeah, I don't know. I don't know how to fix that because I think, you know, academia more generally, not just deep learning, but academia more generally is always going to suffer from this kind of ivory tower problem. You know, people by definition, people in academia, the vast majority of the time, are people that never quite left university.

So they're kind of the people for whom the status quo is the way they live their life. You know, they went from school to university and they went from university do a doctorate and they went from your doctorate to a postdoc. You know, like they never thought like, oh I wonder what's over there and went and did something else. So that's, that's always going to be a problem in any academic discipline. I think, you know, deep learning suffers from that as much as any, but it's a particular problem for deep learning because it's an academic field that has such great practical importance.

And it's also one where empirical results do matter, but you just don't see that in the academic kind of, you know, reviewing guidelines or whatever is like strong empirical results.

Some people think that there is, but actually when you look at it, the interest is in showing training on a thousand GPUs. Definitely the deep learning. Academic community loves that stuff. But where are the things which about like, where are the benchmarks of what can you do with one GPU in one hour? Or the benchmarks and what can you do with a hundred images? Or the benchmarks on, you know, how effectively can we apply? Take learning to understanding media bias? Or to identifying effective treatment plans or whatever? So yeah, there's a long way to go, both for the academic community but perhaps more importantly for how, how the kind of world of industry outside of machine learning and deep learning utilizes those techniques.

Alex Wang

Fast forwarding, you said something really great, which was a deep learning can probably use, be used in at least as many places as the Internet, which I think really, really speaks to its, its magnitude, what sort of the, let's say in in 20 years or 30 years or whatnot, the sort of optimistic and pessimistic cases for deep learning and AI?

Jeremy Howard

So I tend not to look ahead that far. Because I, I find I can't do that effectively. And there's like, I try to prefer to focus on kind of optimistic and pessimistic uses of AI, now. I say this particularly in this, in this respect, because a lot of people talk about kind of the dangers of superhuman intelligence and AGI and stuff. When there seems to be a real lack of understanding in the community, at the moment, of how machine learning is being misused right now. So we include ethics components in all of our courses to try to help combat this.

So, you know, right now, algorithms are being used in a machine learning based algorithms are being used for things like setting bail or, I mean in theory for recommendations for setting bail and recommendations for sentencing. But can you imagine what happens when a judge, with no background in statistics or anything has a computer that says probability of re-offending, high. Recommendation, you know, don't give them bail. They accept it, right? Because they don't have the background to understand like, all the, you know, what's gone into that and it doesn't actually work. And that particular algorithm called Compass, it turns out that, it's basically no better than random. But it's highly biased. So it basically, it was created using highly biased data. Because for example, in the US I think, blacks are given time for marijuana like six times more often than whites. Whites use marijuana just as much as blacks. Right?

Alex Wang


Jeremy Howard

So you have this like highly biased data and you and I know if you take that highly biased data and you put it into an algorithm, you get by a back of highly biased sentencing or or recommendation. And then if you take that and then judges start using that algorithm to set bail and to set sentences. And it also, that same data is then also used for predictive policing, right? So predictive policing is also being used quite widely in the US. And so police are told you're likely to find criminals in these locations tonight. Oh, that's kind of the dystopian view is like a massive feedback loop causing increasing inequality. Increasing bias. And you know, that kind of thing happens in history from time to time and tends to end up with huge violence.

You know, the Utopian version is that more and more people learn how to use these techniques in their jobs, as you know, environmental auditors' and activists or journalists or you know, I've seen recently there are programs to teach judges about the basics with artificial intelligence, for example. And we see more and more people understand what these tools are good at, but also what the dangers are. And no one answers the questions to ask. And it could be used to dramatically improve productivity and it could be used to traumatically increase accessibility.

So like there could be community health workers in remote parts of China who have kind of all the expertise of the world's best radiologists, kind of, there at the press of a button that they could draw on and with one month training rather than 10 years training, they can kind of become effective diagnosticians.

So yeah, they're, they're exciting opportunities and really concerning threats and they're both in the much shorter time frame than 20 or 30 years.

Alex Wang

It was incredible conversation. Thanks so much for, for having us.

Jeremy Howard

Thank you.