Interested in the field of graph neural networks and representation learning? Or just into Deepmind and how it is to work as a research scientist there? We recently had an insightful discussion with Petar Veličković, a research scientist at DeepMind and an affiliate lecturer at the University of Cambridge.
Coming from a purely mathematical background, Petar's journey began with a passion for competitive programming and algorithmic problem-solving. Yet, his academic adventure led him down a different path, steering him towards a career in machine learning.
Petar's shift from programming to AI was impromptu, driven by curiosity. Programming was exciting, but research offered a way to tackle unsolved problems, which was exhilarating.
Petar's PhD journey involved self-management, networking, and problem-solving. He stressed that a PhD shouldn't confine you to a certain field, but prepare you to overcome new challenges.
Petar finds his PhD journey rewarding and beneficial for personal growth and would recommend doing one. However, he underlines that it's not a necessity for AI research—ambition, curiosity, and willpower matter too.
Petar is also one of the main person behind the Google Maps' travel time prediction algorithm, which we talk about in the episode. One of the few research projects that are actually useful in the real world!
We also discuss his work at Deepmind, research in academia vs. industry, and his experience as a teacher. It is an amazing interview for anyone in the field with interest in research. Tune in to the episode on Spotify, Apple podcasts, or YouTube:
Or read the full podcast transcript below...
[00:00:00] This is an interview with Petar VeliÄkoviÄ‡, research scientist at DeepMind and affiliate lecturer at the University of Cambridge, where he also did his PhD. Petar is an expert in Graph Neural networks as well as Representation Learning. In this interview, we dive about what he does at DeepMind, but also how he applied his research.
For example, in Google Maps, there are also amazing tips with the PhD and how to prepare yourself for a resource scientist role. I hope you enjoy this interview.
I usually ask who you are, but I will maybe make this a bit more precise by asking first, what is your academic background before getting into Deep Mind.
Thanks for that question. I feel like the academic background story is a bit interesting one depending on where exactly you begin the story. For me I guess it makes a lot of sense [00:01:00] to start even before university depending on, you know, where our story takes us before starting university.
I was born and raised in Belgrade, in Serbia, and I went to one of the mathematics oriented high schools that existed in the country. So I basically took a curriculum that was very focused on mathematics computer science, physics, and those kinds of topics. And from there I ended up.
Moving to Cambridge in the uk where I finished both my undergraduate degree in computer science and my PhD in machine learning. After that, shortly after submitting my PhD in 2019 in the area of Graph Representation Learning I started working full-time at DeepMind, as a research scientist.
And from what I've seen, you didn't right away want to get into machine learning , like before your PhD your interest was mostly on programming. And I've seen that you've done many [00:02:00] competitive programming. So can you talk a bit more about your experience with programming, and why was the switch to Machine Learning or AI?
Yeah, this is exactly the reason why I felt like I should start before university because in reality I started to dabble with programming at a reasonably early age. And When I started to study computer science oriented topics in high school, I realized that what I personally found most attractive, which was kind of the main way in which students in my region were first introduced to these concepts.
It was through the lens of Classical Algorithms, Data Structures, and Competitive Programming, which was one of the kind of easiest ways to grow your skills in these areas. So you learn to solve algorithmic problems as quickly as possible, such that your programs need to terminate in a certain amount of time.
They need to use no more than a certain amount of memory and so on. And this is the kind of problem solving that led me to computer science, as a degree of study and as a [00:03:00] result, when I was doing my undergraduate degree. I wasn't really exposed to artificial intelligence in any shape or form.
And I just really liked solving competitive programming problems and writing classical algorithms and data structures and optimizing them. So I really believed that, you know, I was in for a career of a software engineer. This is at least, what I believed I would be working on. At that time, however I have done then a software engineering internship and I realized that while it was very interesting, it was not fundamentally what aligned with my core interests.
And after doing a few research oriented internships, after that, I realized I'm actually quite interested in research because it was a way to not only think about how to recompose existing solutions to solve existing problems but also a way to try to attack problems for which no solution really existed at the time.
My initial contact with research was actually in a completely different area. [00:04:00] I worked on hardware design related stuff. So you know, hardware specification, languages and so on. However While I realized that research is very interesting, I also realized that I'm not that talented in hardware.
So I had to find a different area. And luckily enough for my final year project at university for my bachelor's, I first wanted to do a project in bioinformatics because I heard that bioinformatics is flow of classical algorithms. So this was a great way to kind of or so I thought to combine these old inspirations I had that brought me to this field.
And then also to contribute something with real world usage like biology. However very shortly after starting conversations around this project my advisor who ended up becoming my PhD advisor as well professor Pietro Leo at Cambridge University. He basically told me that I shouldn't bother with doing classical algorithms in bioinformatics, that nowadays everybody in bioinformatics is learning machine learning, so I should probably do the same.
I listened to [00:05:00] this advice, I ended up doing a full year long project in machine learning, which eventually then turned into a PhD in machine learning. So that is roughly how I went from competitive programming, classical algorithms, software engineering, all the way to research in artificial intelligence.
And was the PhD the same thing that you expected in the sense that , You mentioned that you liked the research and developing new ideas or solutions versus implementing existing ones? Was. That satisfied with the PhD? Do you think it was worthwhile or like the direction you wanted to take?
Yeah, for me personally, I think all things considered the PhD was a fantastic time in my development.
And it's one that, you know, if I had to make the same decision again, I would've made the same decision exactly the same. So basically yes, it exposed me a lot to research and just you know, focused me on trying to execute on building solutions to [00:06:00] previously unsolved problems over a very long period of three years.
But beyond that right, it also put me in a position where I had to suddenly become completely in charge of how I organized my own research and my own life to navigate the research landscape and make the contributions as best as I can. And also this was something I wish I realized a bit earlier.
Basically, it went beyond just trying to push out a certain number of research papers to show that, you know, I was capable of doing something like this full-time. But also it is a one of a kind opportunity to just expand your network and meet a lot of really interesting people, make collaborations left, right, and center whenever it's possible to do so.
You know, a PhD, at least for me personally, proved to be an amazing time to build some of these connections. And you know, I'm quite sure some of them will last a lifetime and they were very valuable in shaping me, not just as a machine learning [00:07:00] researcher, but I like to think also as a person.
Yeah, definitely agree. And do you think this can also be an issue where, You are specializing into a very specific area with the PhD. And for example, just to give my personal experience, I'm doing a PhD with medical AI basically. And do you think that can close me doors in the future if I want to transition into NLP and work on the next GPT, for example?
Like will I be stuck into healthcare related segmentation tasks in AI? So that is a very important question to address. And I mean the answer is very clearly no. But that's because this is not what the PhD is, right? The PhD is not something that prepares you to contribute to one body of work, which is then the body of work that you will be riveted to for the rest of your life.
No, not really. A PhD is no more or less than an [00:08:00] entry ticket. A confirmation that you are capable of persisting and creating work towards a particular research objective over a sustained period of time. It's really a proof that you are capable of doing this. Hmm. The specific area that is relevant, it is the very nature of research, especially so in an area like computer science and machine learning that you know, the trends of tomorrow, probably nobody is thinking of them today.
Right, exactly. So there was a time before scaled up large models and you know, what did all of the researchers who are now doing great research , in large models, what were they doing beforehand? Probably not large models. They were working on something different because you didn't necessarily have the compute or the scale for these kinds of models previously.
But a lot of those people do have PhDs and they are generally capable of executing on these directions which prepares them. Right. So I would argue. A good PhD. You know, a successful [00:09:00] PhD is not something just that makes you a very deep expert in a particular area, but it also makes you a master adapter, right?
Like someone who is capable to cope with whatever the environment throws at you and adapt properly and restructure your message properly and still push out some very interesting research work in that environment. Right? So I would say you know, this is all the more true of someone who's doing a PhD in machine learning nowadays because the trends change even more quickly than the typical timeline of a single PhD, which is, you know, three to five years.
Yeah. And that's why also it's not scary, but, you definitely have some hesitation when starting a PhD now just because it goes so fast and you are basically a beginner. So it's hard to compete with. This fast progress and especially when it comes from Google and Facebook and all the big companies that may have much more potential , and [00:10:00] just money and computing than your university.
So that's definitely something that can be scary when starting, but you still learn a lot. And for instance, on my end, I can apply very new technologies and very iterate very quickly. For example, with segment, anything model that came out recently , that is quite powerful for medical segmentation and that's like something that came out, I don't know, two months ago and we could already fine tune it and everything.
It's like really impressive. Even the stuff big companies do, we can apply it very quickly and that makes our PhD like much more, not general, but like diverse. We can try many different network and architecture, whereas maybe in the past PhD could focus on just one architecture or like one technique, whereas in not even a year, I already tried many different networks and [00:11:00] many different approaches so that I believe that the PhD are much more different than a few years ago.
Also, I wouldn't think of it, you mentioned the word compete. I really wouldn't think of a PhD as entering into a competition with big companies. Right. I personally see it as the industrial realm. Obviously it should be here because, you know, it has the capabilities and the incentives and the teams to make these large scale developments happen.
Like, I would argue that the industrialization of research has to some extent, really helped us progress more fast more quickly , as a field over the past decade. However, at the same time, I think academia is more important now than it ever was because while industrial research will probably do a really good job at pushing down a particular research line, There are lots of very exciting side avenues that all of this research unlocks [00:12:00] that fundamentally you won't expect a lot of the industrial groups to focus on as much because they will have their own objectives, they will have their own incentives.
And the deeper you go down the scalability pathway, the more of these interesting side paths open up. And sometimes to explore these side paths, you often don't have anything other than curiosity guiding you. And this is the kind of research that really thrives in an academic environment.
So that's one thing, right? So , there are lots of ways as an academic to be very fruitfully a part of the ecosystem and not even enter direct competition, but just, you know, help us understand more about what it is that we're building, right? Yeah. And I would also say that even in today's day and age, you can make very impactful, top tier published research that doesn't even require, like, even GPUs, to be honest.
Like some of my relatively recent papers that were published that say ICML last year can be [00:13:00] completely trained in a couple of hours on a single cpu. So you do not. Like, if you want to make a result that will make people look and get people interested, that will help us understand the capabilities of these models a bit better.
You don't necessarily need access to huge amounts of resources if you can pick the right problems. And actually, in part, a PhD also teaches you to pick your battles, right? Like if someone walks into a PhD at a university with the expectation that they will compete with a company, that is probably a misaligned set of expectations, right?
But someone who walks into a PhD and walks out of it with an understanding of how to choose the right problems to make the most impact with the resources they have available to them, you know, that is a very positive sign. That is someone who is really taken full advantage of this opportunity and grown a lot during it.
Yeah, it's definitely a lot of valuable experience for becoming a research scientist in the industry afterward as well, just because you develop so many[00:14:00] great skill sets , for this role. But would you say it's interesting to do a PhD, even if you want to become MLOps or developer, like building these tools instead of creating them?
Yeah, so this is also a very important thing to address. I would say that A PhD is not for everyone. And you know, it used to be the case, and I've seen people do this before, that they felt if they wanted to work with top tier machine learning systems or to do machine learning research, they absolutely had to do a PhD beforehand.
I would say even back when I was starting, this was not really the case. And nowadays it's definitely not the case. Like some of the most impactful ML products and developments we see out there have been propelled by people who don't even necessarily have a master's degree, right? So basically the barrier of entry is low enough and the knowledge that you need as a prerequisite is specialized enough that if you want to just focus, you know, on the [00:15:00] few relevant areas of math and kind of extrapolate from there to learn a few relevant programming frameworks, you can basically do bleeding edge research without any formal machine learning or even computer science education like this is, to me, it's a really magical thing about AI research compared to other research areas. And it's also quite important because with the rise of influence of AI and the amount of groups that are impacted by AI in one way or another, we really need as many different stakeholders to be directly part of the conversation.
So we really want people with all kinds of backgrounds, with all sorts of you know, capabilities and trainings and different levels to contribute to this conversation. Right. So the fact that you don't need a PhD to do these kinds of things is, in my opinion, a very good thing, right? Yeah. So you like one should only consider doing a PhD in any area, not just machine learning.
If they are genuinely [00:16:00] excited about the concept of doing research, and they want to do it as a career, as a way to basically academically lead a unit of people on executing research agenda, right? So because a PhD really prepares you for that three to five year period during which you're solely responsible for formulating your own program, and that freedom is both amazing and scary, right?
So if you're capable of surviving that you get a PhD as a rubber stamp approval that you're capable of surviving something like this, this now gives you a kind of capability that someone with a master's degree might not necessarily have. But that capability is not required for you to be able to do a research project.
Right? So that's maybe the key distinction. And I will just say that We have so many awesome success stories of people without PhDs that have gone on to do pretty great stuff in the world of AI research I personally always tend to highlight Alex Gordi, who is a compatriot of mine who successfully got a research [00:17:00] engineer position at DeepMind and worked directly on research oriented problems and applied problems without ever having any formal machine learning education.
He was completely self-taught. And for my PhD I can also say an anecdote. I started my own PhD with an understanding of machine learning. That wasn't really to the level of comprehending how people do deep learning. In the year of 2016, I used to start by thinking that everybody wrote neural networks in C++ and wrote back propagation by themselves, right?
This was the kind of picture I got from my AI courses that I took at the time. And then I started doing the PhD and I basically struggled quite a bit and a master's student actually suggested to me that there exists you know, something called Coursera and something called Udacity and something called TensorFlow.
And I could use a Udacity course from Google to teach myself TensorFlow, to actually, you know, do machine learning the way it's supposed to be done modernly with deep neural [00:18:00] networks. And so I basically learned all about deep learning as a, you know, a two week Udacity sprint. And I've taught myself flow like this. So, you can pick up these skills without completing a PhD, for sure. Yeah.
Definitely. I did the same thing during my master's, just because my university, as a lot of university was quite behind with AI. So I took online courses as well and just taught myself to use PyTorch on my end. And what we think was more useful for you to find work at Deep Mine, for instance, in AI research between your PhD and all the very cool projects and competitions you did related to competitive programming. Because right now, I believe your work is kind of a lot related to efficiency and what you did with competitive programming. So it must have helped a lot.
Right. Maybe I'll just quickly remark on what you said about PyTorch. PyTorch did not exist at the [00:19:00] time when I was Oh, yeah. When I was doing this, so I didn't really have a choice. And well, nowadays at Google DeepMind, we use Jax obviously. So I am also a big proponent of using Jax.
PyTorch is obviously also really nice language , for academic prototyping especially. So in terms of the skillsets it is true that today I work quite a bit on the intersection of classical algorithms, competitive programming algorithmic reasoning and deep neural networks.
So for what I'm doing right now, that background was quite useful. I would argue perhaps the background was more useful than my deep learning experience. But forgetting into DeepMind, which is what I think your question was about, it was, I would say 80 20 splits. Right? So, For the most part the skills you get through the PhD, the ability to recognize and categorize different approaches to deep learning, reinforcement learning and like talking about the [00:20:00] pros and cons of various methods, being able to quickly react on the spot when someone writes a new problem on a whiteboard.
Those kinds of skills you pick up during a PhD, like just by doing lots of whiteboarding sessions, by doing lots of projects together with different people, by building an understanding of your research, presenting it at conferences, and you know, being able to deal with snap questions that someone will ask you in front of your poster or something like that.
Yeah. So the PhD skills were really beneficial for that part, and that constituted maybe 80% of what my interviews were like. The interviews are nowadays a bit different, but obviously yeah, I can speak from my personal experience. But that being said, they are also very rightfully so careful that you know, whoever comes into a company like DeepMind is a research scientist, has a good grasp of all the foundational aspects of mathematics, computer science, and statistics.
So clearly that foundational knowledge is very useful at certain stages of the interview. And also it [00:21:00] helps to show that you are also a very competent coder. So there were also several parts of the interview that were dedicated to, you know, the kinds of algorithmic problem solving in a coding session that you might see when you're applying for a typical software engineering job.
So I would say, Every single part of my upbringing played a good part in this interview. It all kind of brought that one place, but the PhD skills were the most important ones for a research scientist position. Obviously, if you're applying for a research engineering position, the balance will be very different than this. But hopefully that makes sense.
Yeah, definitely. And if you are not coming from a PhD background, but you want to do research, how would you suggest someone to practice the same skills or build a relevant portfolio, to apply to a similar role as you are holding right now?
So first of all, this varies between different companies, but research scientist positions typically require a PhD. But you know, [00:22:00] nowadays when you look at the real world problems that we have to solve on a regular basis, the line between what a research scientist does and what the research engineer does, gets more and more blurry. So, Even though you might end up applying for a research engineer position, which typically requires a bachelor's, you might still see yourself working on very fundamental research problems and publishing papers.
And, you know, it really depends on what you like and how you like to approach it, right? So the most appropriate position in this situation would be a research engineer position. And you can indeed work yourself even completely by yourself to that position. I generally do have a few pieces of advice I'd like to give myself to those people cuz you know, they do reach out to me from time to time and we have these discussions.
I think the best place to start for someone who's generally good at coding but wants to get better at the research side of it is, you know, just to make advantage of the very rich open source community. Like many of the influential papers[00:23:00] that are published on a regular basis have their code open sourced, which is a great way for someone to just, you know, by inspecting the code, you will get a feel for how research code is built for how stable experimentation is created and, you know, generally the kinds of tricks that are not directly written inside the papers, but are very important for like numerical stability and things like that.
So, That's the first step. Like, look at how other things are implemented. Look at how they're doing it. Maybe read the companion paper and try to map the parts of what's in the implementation to what's written in the paper. Right. At this point, you're not still doing any research. You're just doing the usual mapping a human written description of what's happening with the actual implementation in Python.
Right? So that's the first thing, right? Then. Once you read up on a few more of these things, maybe you start to re-implement some of them yourself, just to get a feel that you understand how these things work rather than just being able to inspect them. And also you might see, for [00:24:00] example that there are ways in which some of these papers could be subtly improved, right?
And very often this is not so subtle because when you send a paper to a conference and it gets accepted, usually a reviewer will only let you publish your paper if you're very honest and upfront about the limitations of your model. So nowadays authors will, in most cases, literally write in their paper what are the things missing from their model?
Yeah. Right? And what would be great to have in the future. So you can literally take this as inspiration, right? You might take this idea, change the equation that the model is using in a subtle way, like change one or two lines of code. And you might see that by changing these few lines of code you know, you get much better performance than what the authors originally reported or something like this.
And these kinds of incremental improvements they are a great way to start in this area. Because you do this you obtain some improvements, you now understand where those improvements are coming from a bit better, and you can then [00:25:00] try to write up some small four page version of what you've just done.
Try to make it sound like one of those papers you've been reading and you know, send it to a relevant workshop. Workshops are in my opinion, kind of a treasure that. The sooner you discover it as a researcher the more benefits you'll have from it. I spent two and a half years of my PhD not realizing the opportunity that comes from workshops.
And once I did it was a game changer for me personally because it's almost a bit ironic because, okay, top tier conferences in machine learning are really notoriously hard to get into and you need to get super lucky. But in comparison, a workshop is usually much easier to get into. Like the acceptance rates of workshops tend to be about 75% on average.
So you have a pretty good chance of getting a paper accepted as long as it's doing something correct and teaches us something interesting. So you write up that four page paper, you send it to a relevant workshop at one of these top conferences. You get into the [00:26:00] workshop and what's usually the biggest irony is that the interactions you get during the workshop are probably often better than the interactions you get during the main conference because, so it's attached to a top tier conference, therefore all of the top people that attend the conference will also be at the workshop.
But the difference is that the workshop will focus on some specific area of machine learning that's within the context of this paper. So, You will be surrounded by top people who are specifically interested in the thing you've submitted. Yeah. Which means you're gonna have very targeted conversations.
They come to your poster, they will ask you questions, and those people then might either be your reviewers in the future when you decide to send a more full paper to a conference, or they might be collaborators in the future. So someone you can talk to more to develop ideas. So the benefits are much, much higher and the barrier of entry is much, much lower.
So that's a great way to both get a feeling for what working in research looks like and to build a credential. You know, now you have a [00:27:00] publication to your name. Now there is tangible written evidence that you are capable of conducting research. Therefore, it's a great ticket to talk about a research engineering position.
Right? Yeah. And I will just say that this is only one path. This is the path that I typically suggest to people, and it usually works quite well, even for people who are more engineering minded. So they usually like these small scale excursions into the world of research. It works for most people I talk to, but that's not the only path.
Into this realm. And we already mentioned Alex previously. He has this amazing video where he talks on his YouTube channel about how he went from, you know, zero knowledge in machine learning to being able to apply to a deep line research engineer position. And there were no research papers there.
He just studied papers, re-implemented papers, tried to understand their internals, and just making lots of open source contributions and blog posts that got him generally noticed in the community. So that's also another path, but it's [00:28:00] the fundamental, kind of, the core part of both of those approaches is the same.
You're studying the literature, you're trying to understand the literature, you're trying to map what you understand to implementation, and then you're trying to either re-implement or extend it in some way. So that would be, I think that tends to be the most robust path forward for, you know, generally skilled coders who are interested to participate in the world of AI research in this way.
And as you mentioned, I'd add that along with doing all that, maybe the most important part is the visibility where you share what you did either at workshops or on your blog. Just like you said, or on YouTube videos or whatever. Just sharing out there might be pretty much the most important if you then want to apply to a job or just have opportunities where people reach out to you directly just because you are visible online.
This is exactly true, right? And like you can do the most amazing things. If nobody can see them, it [00:29:00] doesn't really matter that much. So the visibility is very important and probably more important than most people think.
Yeah, completely agree. A lot of people are aspiring to, to get into deep mind and do research at Deep Mind, like that's the a dream for a lot of people.
What is actually a research scientist at Deep Mine? What do you do? Do you write people, code, manage people, and like, what's the ratio of all your different sub-tasks? Right. So I mean, different people will have completely different stories when you ask this question. As I said, it's a very fluid world out there.
Yeah. And I can say that just talking about the engineering versus science divide, I have worked on projects that were two and a half year long engineering projects where I basically just did lots of engineering and it did not really feel like research. It culminated in papers because what we built were, you know, useful benchmarks for the broader AI community.
But the work itself did not have any real uncertainties associated [00:30:00] with it that would typically happen in a research work. So it really depends. I would describe my general day-to-day as this sort of virtuous cycle of Learning about new stuff. I constantly try to learn about new stuff. And I think as a scientist, you have to always stay on top of things, try to learn new things, gain new perspectives to attack problems.
And then based on those learnings and understandings, you try, you see a problem where that you believe can be improved through some novel research. You propose a way to attack it. You team up with some people that you collaborate with, and This work together, then if the hypothesis ends up to be true, you end up usually writing up some kind of paper together that you can then publish at a relevant workshop or conference, present it to the, at the venue, talk to the people at the venue, and then you realize talking to the people that there's many new avenues of research and understanding that you can do, which sets the cycle in motion one more time.
Right. So this is the general kind of way it works. I [00:31:00] would argue the way I currently split my time is it's a careful mix of managing a couple of teams where, you know, I am trying to do as much interesting and in-depth reasoning research as possible. Reasoning is currently one of my main areas of interest.
While at the same time making sure I have enough time to. Fundamentally contribute in a coding and engineering kind of way to kind of keep my skills sharp. And I also try to set aside enough time to learn about new concepts. The current area that I'm focusing on quite a bit is category theory.
We published a few papers on this topic recently. I'm trying to improve and hone my skills in it as well. I feel like it's gonna be a great theoretical tool to just make sense of deep learning as a whole and maybe even help us build new architectures in the future. Just a great way of approaching problem solving in my opinion.
And lastly Since about two years ago, I also have a dual position at my former university. So I'm currently an [00:32:00] affiliated lecturer at the University of Cambridge, where I also teach a master's course in graph representation learning. So every year I am responsible for teaching this course to a group of about 50 master students very, very high quality students.
And I always try to, you know, dedicate as much time as as I can to delivering the best coursework possible to these students, and also to try to discuss with them their various perspectives to support them if they want to try to publish some papers themselves and so on. So I feel like having this academic balance and being able to also.
Share my vision about the whole field with a group of students like that is very priceless. So it's something that I also try to spend some portion of time on every year.
And regarding your current work at the DeepMind, I've heard somewhere that you mentioned that what you are doing with reasoning was a promising avenue for [00:33:00] AGI. If I'm not mistaken, I'd love if you could. Just quickly, well not quickly just introduce and say a little more about what you are doing mm-hmm. With reasoning.
Yeah. I did give this talk where I mentioned that what we worked on recently and multitask reasoners could be a promising avenue towards building AGI components.
And this was me partly paraphrasing what some people on Reddit were saying about our research, but obviously, you know, if research that we build ends up becoming meaningful components of AGI, I will obviously be very happy about that because I feel like reasoning is one of the components where our existing top tier systems tend to struggle quite a bit.
And there's lots of approaches to try to address the problem. But we don't really have any conclusive you know findings over what's the best way to solve this problem. And until you can solve this problem more robustly, you don't have anything close to AGI. It's as simple as that, right?
Mm-hmm. So, I personally [00:34:00] am trying to find ways in which we can bake in the knowledge that say a human software engineer has to develop as part of building their skills, say in a theoretical computer science curriculum, and seeing how we can reconcile those kinds of skills with what a neural network is capable of learning. And if you think of what a typical theoretical computer scientist student must learn to do, you will see lots of hallmarks of very useful behaviors, like being able to break down a problem into simpler problems, being able to reason about complexity. So which problems are even solvable or tractable in a given amount of time.
And when it might be better to just say, I'm sorry, I'm not able to solve this problem in the amount of resources I have, rather than trying to confidently give an answer to that problem with hallucinated answers, which is what tends to happen with the top tier systems today. Right? So, you know, it really entails a world of [00:35:00] understanding that I think it'll be very useful for our models to have.
And our models don't have that kind of understanding currently. So we are basically trying our best to make it happen. And, you know, if you make this happen, I would be surprised that such a technology doesn't make its way to an AGI system of the future.
And could you say a bit more on how you are approaching this right now, or, Like in the most recent publication or thing that you can talk about?
Sure, sure. I mean, the area that we work on is in short known as NAR or Neural Algorithmic Reasoning. And one of the ways in which we try to capture this knowledge is by asking a neural network to imitate individual steps of what an algorithm might be doing. And to be honest, this research started a bit as a toy experiment for us because, you know, this was just something that I had to do a lot when I was in my competitive programming days.
And I was just interested to what extent can the top neural networks of today replace me in [00:36:00] this endeavor? And I was actually quite surprised that even for the simplest algorithms, it was not as trivial as it appeared. And specifically while with some tricks, you were able to get neural networks to fit your training data and your validation data reasonably well, like near 100%.
You would still struggle a lot when you have test data that was say, much bigger. And this is a big problem when it comes to reasoning systems because if you have a self-respecting reasoning system, it should work no matter how big of an input you throw at it, right? Mm-hmm. It shouldn't magically collapse after your array becomes bigger than 15 elements or something like that, right?
And we see these collapses quite brutally, even when you move slightly away from the training distribution. So we spent a lot of time over the past years trying to think of what needs to be done in architectures of neural networks to get them more robust in this particular way. And this all piggybacks on the concept of algorithmic alignment, which was introduced by MIT researchers in 2020.[00:37:00]
And effectively what it says without diving into more details, it's mathematically very well-defined concept. It says that you should try to design your architecture so that it's. Closely lines up with the kind of computation you want to execute. Simple as that. Right? So this means that, say, transformers, which are a current very strong state-of-the-art architecture.
They are great. When you want to do the kind of computation that's supported by transformers, which is taking lots of potentially meaningless data points and figuring out the best way to combine them to get meaning. And you know, in text or images, individual components are fairly meaningless, right? A single word doesn't have meaning until you combine it with the rest of the sentence.
A single pixel doesn't have a lot of meaning until you combine it with the rest of an image. So a transformer is a great inductive bias. Kind of stuff. But if you have entities that already have some meaning and you wanna like some variables and you want to do some computation on top of those variables, [00:38:00] which is non-trivial, suddenly a transformer might fit that well in distribution.
But then you ask it to do it on a four times larger input and it hasn't really learned the procedure you wanted to learn it, learned how to hack the distribution that you gave it, right? So that is what our research is about. In a nutshell. We try to find architectural ways to overcome these limitations of neural networks under certain task distributions, data distributions, and so on.
That's super interesting. Could you give some, not practical but applied types of data that this can work with. Like you gave the example for Transformers, but where do you come into play with this different architecture? So I can give you a few concrete examples that we published beforehand that show that this architectural idea can actually give benefits.
Actually, I'll give you the most recent example that we just published at I clear this year as a spotlight paper. It's called the Dual Algorithmic Reasoning. And it deals [00:39:00] with a very interesting neuro related problem where we look at these imaging data of the mouse brain of the blood vessels in the mouse brain to be precise.
And we aim to classify what type of blood vessels they are. So are they veins, arteries, capillaries so on. I am not an expert in mouse anatomy, right? So I don't know necessarily what is the best way to do this problem. However, I do sort of understand that the main purpose of blood vessels is to conduct blood flow.
So if I had some way to analyze the flow properties of this network, I might be in a better position to answer the question of what type of blood vessel this is. Hmm. So what we did in our paper is we trained the model using the principle of algorithmic alignment to simulate the individual steps of a max flow algorithm, which as the name implies, literally computes the flow [00:40:00] properties of a particular network.
And then we took this pre-trained model and we deployed it inside a state-of-the-art graph machine learning system that predicts these blood vessel properties. And just by doing that, we improved the state of the art at the time by about 5 to 6%. What's really interesting though, is that we trained this algorithmic module on very small graphs, like 16 nodes or something like that that we synthetically generated for the purposes of executing the flow algorithms.
The mouse brain data is 10 million nodes big. So we trained the Reasoner on 16 note graph and we unleashed it in a domain where it's 180,000 times bigger and it still gave a representational advantage, right? So this was no hyper parameter tuning, no attempts to like bring the two distributions closer together.
Like you can see here, the principles at play, just understanding what the computation should look like, training on the right kind of problem to deploy it on already was enough to give [00:41:00] representation advantage. So, you know, I imagine with a lot more effort, you could probably push this boundary even further.
But this is just to give you one example of we took this procedure that we assumed will be relevant and then it indeed showed to be approved to be relevant in this particular space.
It's really cool. Yeah. I'm changing topics a bit, but you also mentioned earlier to me personally, that you've worked on an algorithm or some part of the algorithm to improve travel time predictions that was actually implemented into Google Maps. So that's pretty cool and something that is completely applied in the real world. So could you tell a bit more on the whole process of making research and then how this was actually implemented into a real app like Google Maps?
Yeah, this is a great question and I feel like, to me, this was one of the transformative moments when I worked on this because it allowed me to jump [00:42:00] out of the realm of, you know, pure research where we were just concerned with putting out the solution that gets the best numbers possible on a particular benchmark.
And, you know, playing with our different Lego ball blocks in our little sandboxes, trying to figure out what's the best way to stitch these components together without really having any regards to how this thing will be used in the real world. Right? With Google Maps, you didn't really like have a choice but to consider the, what the real world data looks like.
So you might imagine, you know, you take out your phone, you ask Google Maps point A, point B, what is the best way to travel from A to B? And Google Maps will then give you a few suggested routes and it'll give you estimated travel times. So how many minutes will it take you to travel from point A to point B?
And this number is what our system was trained to predict. So whenever you pull out Google Maps or whenever you use an enterprise app that relies on [00:43:00] Google Maps to tell you how fast your food is coming or you know how fast your car is coming to pick you up you are interacting with a graph neural network that we have designed at Google DeepMind for this purpose.
And you then realize, you know, thinking about this problem, you would expect it's a simple pathfinding problem, right? Like you have the graph structure of the road network, you know, roughly how fast you can traverse each part of the road network and that's great. You then just run a shortest path algorithm and it tells you the travel time, right?
But No, the real world is much more complicated than that. Not only is it not static, so you have lots of dynamically changing conditions on the road network that can drastically affect the predictions you get. So you can have things like changing weather conditions, roadblocks, traffic lights, all sorts of things that dynamically change the way the traffic flows in the network, but also the data itself is not as clean as it sounds.
Specifically. The way Google Maps collects these signals is [00:44:00] by using the phones, right, your phones are in your car, and based on the positions of the phones through time, the system infers roughly how fast the traffic is moving. Right now, this is not only noisy. But also prone to adversarial behavior.
So you might have heard there was this infamous adversarial attack where a person put a hundred phones in a sack and walked down the street. And in the process of doing so, convinced Google Maps to route all traffic away from that street, right? Because it felt like there were a lot of slow cars in there, right?
So in such a setting, you know, it's not easy to make these concrete, nice assumptions about your data. The data really is not that clean. And based on this also keeping in mind that this system has to serve billions of queries on a regular basis, touches a lot of users. It's a very important product.
And the thing that you must also realize then in this setting is that the research considerations that you might have, which is, you know, give me the [00:45:00] model that achieves the best performance on this particular static training and testing dataset split, which is what is a typical setting in most AI research papers today is very different to what the real requirements were here, which is really find me a model that first of all gets good performance on this data set, but then we'll retain that good performance in the face of, you know, dynamically changing circumstances and different times of day and potentially adversarial behaviors and stuff like that.
And then also at the same time, it should be a model that is very fast. To serve, right? So I can very quickly retrieve the results from this model and build a nice caching strategy around it and so on, so that I can actually effectively serve billions of people's queries without you know, causing significant slowdowns for the Google Maps application.
Because ultimately you will care more about receiving your answers quickly than whether they are one or [00:46:00] two minutes plus minus wrong, right? Yeah. So this is a key consideration, right? And that's exactly the way in which the Google Maps collaboration changed my view of research and how to make it useful to people.
So a lot of the things that in AI research we might find is very important to please the reviewers. Amount to basically nothing when it comes to plugging them in a real world system. And you know, this doesn't necessarily mean that we have to change the way we do research. Like research is structured the way it is for a good reason.
Because it's exactly like trying to get to the bottom of how things work without being constrained by it being pluggable or scalable into a, you know, a big downstream system. But I also feel like we would all benefit as a whole if we understood how our research can impact lots of people and what are the right ways to frame that research in a way that we can have the most positive impact.
So that's perhaps the [00:47:00] main takeaway in how the Google Maps collaboration has shaped my own personal view on AI research as a whole.
Would you say that it was also more fun to you or different challenges to apply your research? Like, would you want to do that more often?
Yeah, I mean, it was my first real applied project experience and since then, like it has changed the way I approach these things.
I try at all times to have at least a few applied projects that I'm investigating on the site. Were scientific projects. So we also had our work on using AI for mathematics, which was developed shortly after the big results from Google Maps were first released. So yeah. To this day I always like to pursue a certain amount of applied work, not just because I found it exciting, but also because it keeps me grounded.
Like it allows me to always keep in mind that there exists a real [00:48:00] world out there and there is occasionally I divide between the results I need to show to the research community and the results I would need to show to actually useful deploy this somewhere. So yeah, like it's something that I actively do to this day.
And I assume it also may shape the way you do research and your experiments just because you may have in mind that it needs to be scalable as what you are working at, but also you already have in mind the different challenges when it will come to productize your current research.
Yeah, I mean it helps even sometimes when you write just basic research papers because you realize that there are now many other pathways to show that research is important.
Yeah, like previously it was the case that I believed if I'm not setting state of the art on a relevant benchmark, then this is not publishable. But now you have multiple avenues, right? You can say your research is competitive with the state-of-the-art while at the [00:49:00] same time being two times faster or using half the memory or something like that, right?
It, it suddenly opens your eyes to many other ways in which a model can be useful. Which, you know, if you read most of the literature as a up and coming PhD student, you might be tricked into thinking that if you don't have all the bold entries in a table as the, you know, number one for your method, then this is not publishable.
But in reality, that's not the case. Like many of the recent papers I've published were not sota chasing, so to speak, like they were. Just really trying to get to the bottom of a certain approach for tackling a problem. And they showed benefits in areas that were not tied necessarily to benchmark performance.
Hmm. What would you say are the biggest challenges in, well, your research, but in AI research versus AI in the industry? Like, are they the same challenges or are they very different? What is the most difficult part to tackle in both [00:50:00] sides of research and industry?
Yeah, so this is another one of those questions where like, I can answer it from my own perspective, but if you ask a hundred different people, you'll probably get a hundred different answers.
Yeah. So that's maybe an important disclaimer to say. But from what I observe currently, I think a lot of industrial AI research is really bottlenecked by engineering. So like, you might have the best idea possible, but if you don't have the technical know-how and the support to make it scale properly, It might not reap the gains and the potential it has.
Right. And you can see this at various scales through the development of AI over the decades, right? Because. Many of the ideas we use in machine learning today are not fundamentally new ideas. Like the idea of a deep neural network trained by back propagation is something that was well familiar to researchers even in the eighties or the nineties.
Right? So what was the difference? Well, the difference was that today we have [00:51:00] access to big data, big compute, and we're actually, we have the technical know-how to scale these kinds of ideas forward, right? So it is really sometimes a case of having the right engineering at the right time and the right platform.
And I think this is becoming more and more of an issue as these systems get bigger industrially. So that would be my answer for the industrial side. I would say for the research side it has always been a question of understanding that is much harder than learning to chase numbers on a benchmark because there's many micro optimizations you can do to make a model perform better at the benchmark.
But to truly understand why is it doing what it's doing, how to make it better. Consequently, those tend to be, you know, the really important questions and arguably the hardest ones. And as you said yourself, you're working in potential applications. The medicine, I would argue in the domain of medicine such answers are more important than anywhere else because a doctor is [00:52:00] really going to care.
Why is a system making a particular decision it's making and like making the wrong decision can be quite costly in these particular circumstances.
Do you believe graph neural networks is the future, or do you already see some limitations?
So I think it's a bit interesting, right? Because the way we set up the graph neural network paradigm it is phrased in a way that basically encompasses all of discreet deep learning.
So if you want to, you can just say whatever model there is on top right now it is a graph neural network. So it's not like, yeah, like basically we're setting ourselves up in a way that we cannot really fail if we really want to always say that we're on top. I don't think that's necessarily the best way of looking at it, but maybe it's a good preface to this answer.
What I would say is that there is a case for certain classes of graph neural networks that perform really well at certain kinds of problems when they're well aligned to them. So I would say, if anything, the [00:53:00] theory of algorithmic alignment is one that drives my thinking a lot these days and it explains it, right?
So Transformers, for example, Our attentional fully connected graph neural networks, which are really good, both from the perspective of the alignment to this idea of I'm gonna just find the optimal combination of meaningless data points. And also have really good alignment to today's hardware.
Right? Today's hardware is well optimized, doing lots of effective dense computations, which is exactly what a transformer does, being a fully connected graphnet. So in many ways, the transformer is the current sweet spot for a graph machine learning. And as a result the concept of graph neural network kind of as a separate concept to a transformer isn't seeing as much attention, mainly because the kinds of problems where you really need that power of sparse full on message passing.
Are not [00:54:00] currently concerning data that's particularly easy to work with anyway, and data that is most directly impactful to the problems you want to solve as quickly, or scalably as you can because we don't have hardware that is well optimized for lots of sparse message communications.
But that being said, these message passing architectures are making very important strides, especially when coupled with geometric deep learning. Basically a lot of the really foundational impacts in biology and chemistry with AI are coming from these kinds of systems. And very recently, you might have heard the story from the team of Jonathan Stokes and lots of collaborators from MIT, where they, for a second time around successfully used the graph neural network to discover a novel antibiotic treatment for a very hard to attack strain of bacteria.
So these systems are definitely capable of doing these kinds of things. It's just that the day-to-day kind of things that people mostly use AI with[00:55:00] in terms of, you know, Image processing or natural language processing tends to not require those kinds of architectures as much. Although my bet is, as I said, in the future, as we start to realize more and more what are the limitations of existing systems, we are likely to start to reintegrate lots of ideas from more unrestricted graph machine learning methods.
And yeah, I do believe that it will be a method of the future, but we need to first understand how to align it better, find out where it's actually necessary as opposed to the great big foundation models we already have. And yeah. Lastly, it might be interesting to also think about is this created deep learning all we need?
Are there any benefits to be had if we think of things continuously where graph neural networks become a bit less relevant? But yeah, those are all interesting questions to think about in the future.
Well, I'm again going to CVPR this year, and at CVPR there's a graph neural network workshop or like half day or full day.
And I'm [00:56:00] always attending it even though I'm not working with graphs but it's I really like the math behind that. And I really like this sub field which I'm not even using in my day to day PhD, but still, I think it's a very interesting avenue and just to work with, I'd love to implement it, but right now units are still the best models for my personal application, so I'm sticking with that for now.
But yeah, it's super cool and. Before concluding the podcast . I just wanted to ask you if you had any upcoming projects or public appearances or anything you'd like to share with the audience.
So, yeah, thanks for asking. And one thing that I would potentially like to plug in this particular moment is some of you might be aware that together with three other co-authors Michael Bronstein, John Bruna and Tako Cohen, I've been working on this book, which categorizes the different approaches to geometric deep learning which is already available on the archive as a proto book.
And [00:57:00] if you go to geometric deep learning.com, you can see lots of other useful resources like. Talks, blogs full lecture courses that we've taught and so on. And hopefully this can be a useful resource on its own. But also we have been, as I said, working quite hard to try to make a full book version out of it.
And we are expecting that with MIT press we will be able to publish, a book on this over the coming months. So yeah, keep an eye on that. And I will also say in terms of appearances for those of you who are interested to learn more about graph neural networks and you would like to attend some kind of tutorial that will introduce you to the concepts behind it.
If for any reason you are either based in Montreal or interested in visiting there. There is the CIFAR Deep Learning RL summer school that will take place in Mila in Montreal, in July I believe. And I was just recently invited as one of the tutorial speakers for a [00:58:00] graph neural network talk there.
So I will be there and if you're interested to know more or you'd just like to come say hi, then that can be one good opportunity for that. Yeah, that's awesome. I also think I will be there. I need to confirm if I can manage, but yeah, I will be at Mila during these times as well. So I was thinking of attending the CIFAR Summer School, I am pretty cool at, it's in Montreal this time. And then I will see you then as well. Thank you for your time and for introducing us to the world of scientific research at DeepMind, which is something that is not obscure, but it is definitely not like well known and spread around, and a lot of people know about DeepMind, but don't really know about what is really like to be at DeepMind.
So that's super cool information and great insights that you shared. So thank you very much for your time.
Thank you for having me. I really enjoyed the conversation.[00:59:00]