Good morning, fellow AI enthusiasts! Today, I'm excited to share insights from a recent conversation with Jerome Pasquero, a seasoned Machine Learning Director with a rich background in AI and mainly data. Jerome's expertise in data and its intersection with AI is incredibly relevant in today's tech landscape.
Here are some key insights from the discussion:
- Data Quality Over Quantity: Jerome emphasized that in AI, the quality of data trumps quantity. It's not about collecting massive amounts of data but about gathering data that's relevant and ethically sourced. This approach ensures AI models are trained on accurate, less biased data, leading to more effective and ethical AI solutions.
- Behind the Scenes with Data Annotation: One of the lesser-known yet critical processes in AI development is data annotation. Jerome gave us a peek into this process, explaining how meticulously labeled data is crucial for accurately training AI models. This step, often done by humans, is essential in making AI models understand and interpret data correctly.
- Balancing AI and Ethics: Our conversation took an interesting turn as we discussed the ethical dilemmas in AI and data. How do we balance innovation with ethical responsibility? Jerome's insights here were particularly enlightening, highlighting the need for ongoing dialogue and regulation in the field.
- AI in Daily Life: It's fascinating how AI has seamlessly integrated into our daily lives, often without us realizing it. Jerome illustrated this with examples from everyday apps to potential future scenarios, like autonomous vehicles, showcasing the pervasive and transformative nature of AI.
This episode offers a treasure trove of knowledge for anyone involved or interested in AI and data science. It's perfect for tech enthusiasts and for those new to the field as we keep it accessible. Learn more about how AI is impacting our world through leveraging data in this week's episode of the What's AI podcast, available on YouTube, Spotify or Apple Podcasts:
Podcast Transcript - Jerome Pasquero
Jerome Pasquero: [00:00:00] We've probably invested more like collectively than a hundred billion dollars yet. How many autonomous vehicles do you see driving around? Especially you and I live in Montreal. There's a, we got about 12 inches of snow yesterday. I really don't see any of these like fancy California style cars that are driving in San Francisco, being able to drive today so that we have a long, long way.
You know, a kid doesn't need that much data to learn the difference between a cat and a dog, right? How come a machine learning algorithm still needs, you know, hundreds of thousands of images to learn? So I really think that we're going to see like really more and more. different initiatives to try to reduce our needs on the sheer amount of data, on the computation that is needed, and even, like, obviously change a little bit of the architecture.
It's not going to be transformers forever.
Louis-Francois Bouchard: This is an interview with Jerome Pasquero. [00:01:00] Jerome is currently the director of machine learning at SAMA, a company involved in the data annotation process. So in this interview, we dive into the data. We cover what's it used for, how do we build those big data sets, the difference between using AI or humans, how many humans do you need, and much more details.
It's a very insightful and accessible discussion about building current artificial intelligence systems. I'm sure you'll love it, and if you do, Please don't forget to leave a, like in a five star review depending on where you are listening. It helps the channel a lot. Let's dive right into it.
Jerome Pasquero: So, hi I, I'm, Jerome, Pasquero.
I, lead the machine learning team at, SA here in Montreal. Summer is, a company that does, data labeling for all of the biggest players in, AI in the world. But we're also a, company. with a social mission. So, we have, we're an impact company. our objective is to pull [00:02:00] people from poverty in East Africa or to, give them access to jobs that, they wouldn't have access to, we aspire to, be a bridge employer, employer when, when that's possible, it because jobs are just very hard to, to get in East Africa. I'm talking here specifically about, the 2 countries that we're in, which are. In in Kenya, so in Nairobi and in Uganda and, we believe that, data enrichment workers, what I've been calling annotators, for instance, have a role to play. They probably have a really important role to play in this AI revolution, and we don't want to leave them behind. today, like I said, we can have them do some, you know, more work.
Tedious and smaller tasks as the models are kind of learning how to perform these tasks, but I'd love to see them kind of grow, with the models as well and eventually become, what [00:03:00] we were talking about, model supervisors, model teachers even. And that's going to be a process that there's no guarantee that will be done, but I think we're, we have a unique opportunity of having a new breed of jobs that starts from, you know, parts of the world that are not used to seeing these emerging new jobs, right? For once, like, maybe this can happen in these parts of the world, where a certain part of the expertise in AI comes from, you know, East Africa, for instance. so again, I have a lot to say about this, but just wanted to mention it to your listeners and if they're interested, of course, we have, we have a website, sama.com, just to read about how we try to implement change in what is kind of a weird industry. The huge, under unknown industry of, of, data labeling that actually [00:04:00] fuels, model AI. Thank you And I'm really happy to be on this podcast with you.
Louis-Francois Bouchard: I'm happy to have you. And how did you get into the field? How did you start with data?
Jerome Pasquero: Oof. I think it goes way back. So I've always been interested in AI from my early years in electrical engineering. But back then, there was no such thing as a deep, learning network, a deep neural network. Of course, there were neural networks, but they weren't really deep, but it was already pretty clear that the importance of data for, for training models, back then, and my career path, took me to really different, places.
I. You know, played a role in the, mobile, phone, development, a small role, mobile phone development into two thousands. But when AI started exploding around, mid 2000 tens, [00:05:00] and that it started to really explode even more in, in Montreal, I wanted it, and there was a company that was starting back then called Element AI, so I did everything I could to, find a role for me, over there. And that's where, I was really introduced to AI in a commercial setting.
Louis-Francois Bouchard: And you mentioned you, you first learned about AI in school and you, you were interested, but. What was it like in those days and when was this exactly?
Jerome Pasquero: Oh, so it would have been around, 2000, probably when I was doing my master's. yeah, there was a few advanced courses, for masters and PhD in AI, that use some of the books I think are, that are still used today. They've probably been updated. One that I can remember is the, Artificial Intelligence: A Modern Approach by Russell and Norvig, and that's [00:06:00] those books would introduce you to, reinforcement learning to neural networks and a number of other, what we would call classic, machine learning algorithms and techniques.
Louis-Francois Bouchard: Did you think it was promising? Like, how did you know it would lead to something with since it couldn't really be used or trained in an efficient way?
Jerome Pasquero: I don't know that I saw it as being really promising. I wasn't trying to predict the future and I don't think I would have been able to in any case.
but I did find it super exciting for sure. we would use some of these techniques, to do, to do competitions. There was a, there was a competition back then called the, I think the RoboCup competition, which was either done in software or you could do it with real like little robots. we played with the software version of it and tried to implement a neural network or reinforcement learning.
It didn't work very well. It didn't work at all, but it was super fun. [00:07:00] I just to get to implement some of, some of these algorithms. So no, back then, it wasn't clear at all that that's. that was, that would become eventually the future of, modern AI, but it was clearly, something that was different and, and exciting.
Louis-Francois Bouchard: What was most exciting to you about the field? Just for example, in my case, when I first discovered artificial intelligence towards the end of my engineering degree, it's basically that I finally found that mathematics could be applied to something useful. And I loved anything logical and math related.
So I just found it was the perfect blend between like applicable and, and science, and that was super exciting to me, but what was this exciting part around artificial intelligence for you?
Jerome Pasquero: Yeah, I wouldn't say that it was, similar to you because I've always thought that, like, math and, could be applied to stuff that was really helpful.
So that was, you [00:08:00] know, I used to do computer vision at a company today, a company that today still exists, it's called Matrox. They used to do, graphic cards before NVIDIA took over. The nineties and you know, like math and computer vision was super helpful back then. what I thought was super interesting about AI is just that we were trying to mimic, what humans can do.
Right. That's what I find it extremely exciting from a sensory point of view. Like, so the input to the human and then the processing within the human brain and the output of like what, you know, What we, we do as humans, as, as actions, and I just try to reproduce who we are is, is what I find is, is fascinating.
Louis-Francois Bouchard: And do you think we are still going that route? For example, just, with current systems, I guess the whole goal is to scale up all the, the transformers, the size of the architectures, the size of the datasets. Do you think we are still trying to replicate how humans understand, or are we like diverging away and just [00:09:00] focusing on.
Instant reward or what's your opinion on that?
Jerome Pasquero: That's a good question. I'd say it's a little bit of both in the sense that yes, I think that any serious researcher in the field will tell you that they're trying to replicate what humans can do and they'll say that what the, the current architecture is, the current trends are probably not really matched to exactly what humans can do, but that's what's working right now.
So why not push it until it stops working? Because we are getting successful in replicating some of the human capabilities without necessarily replicating the real mechanisms, the underlining mechanisms of the brain, right? You'll hear these prominent researchers say that throwing so much data, so much computation at models today, and so much repetition, like in supervised learning, it's crazy the number of times you need to repeat the same thing for the model to, or something very similar for the model to learn.
Whereas as a kid. [00:10:00] Can learn really complicated concept with just a few examples. So clearly we're not, we're not, we're, there's something we're not doing right. right. but then like with the recent successes we've seen, over the past 10 years is still super fun to see how, the output at least is, is starting to be matched to human capabilities.
Louis-Francois Bouchard: I agree. And we, we've seen the, I think we've seen most progress of course, with LLMs, with ChatGPT and everything. Most people, if they know about computer vision or like people that are not in the field that hear computer vision or like image stuff, they think about early or mid journey or like generative AI.
Where they, you give it text and it creates images and we've seen the progress of these models recently, but since you've been in the computer vision industry for maybe 20 years, what's changed in terms of other image tasks other than the new generative ones?
Jerome Pasquero: Oh, I think, a lot. We used to do computer vision [00:11:00] by creating what we would call kernels or filters that would do really, really simple tasks at first.
So like detecting edges on images, for instance. And once you detect an edge, well, then you can detect two edges that are next to each other at a certain angle. Right? And once you can detect two edges at a certain angle, maybe you can detect a closed shape. And every time this is a kind of a different kernel or filter that you're applying on the image on top of each other.
But you have to do all of this manually, and it's usually by trial and error, and it's informed by math, obviously. Right? At some point, you might get into like something that can identify a face, but then you need to identify like whose face it is. So you would keep on like kind of building manually these levels of abstraction on top of each other, each time building on something that was a little bit shaky because none of these kernels were perfect.
That's how we used to do things. Today it's completely different. Today what we do is we find a bunch of examples, at least in [00:12:00] supervised learning, we use a bunch of examples of faces with the labels. So the answer of what we would want the model to tell us, whether, you know, this is a face or whose face it is, or, is this face, average or different than average.
And we, we, we have all this label data, these examples, and we feed the models. Over and over and over again with examples like that until it kind of learns on its own, how to, process these images and give you the, the desire to output. So a completely different approach, and as it turns out, a more successful one than our initial attempts at, computer vision, but it does come with its set of issues and challenges because, in engineering, nothing's perfect, right?
Louis-Francois Bouchard: And. Now, where I see it going is the issue more on the, like the, the quality of the data, the intelligence of the architecture, like how, how well it can [00:13:00] extract information or just. we need even more compute, like where's the next step or where, where is the computer vision field seems to be going.
Jerome Pasquero: Yeah, I think there's going to be multiple, different paths that are running in parallel. I don't see the, like, the one path we're seeing today of like kind of. Adding more computational resources, more data, different architectures are changing that much because it's still successful. It's still, it's still kind of working.
We're still improving on benchmarks, right? But at the same time, people are starting to seriously ask whether this is something that's sustainable. The problem is that it uses so much energy, so much compute. We're also now at a level where there's more compute than what you would have in a human brain.
So again, clearly the architectures we have today are not optimal, right? There's still room for different architectures. So this would be another branch, another path that I see. Running in parallel, [00:14:00] each one is going to feed the other because there are components that you can use from one like brute force approach in the other one, which is a little bit smarter and a smarter path.
Where would it be? Well, can we do this, this with less data? Like, as I mentioned earlier. You know, a kid doesn't need that much data to learn the difference between a cat and a dog. Right? You show them three or four cats and dogs and they get it right away. How come, machine learning algorithm still needs, you know, thousands, tens of thousands Hundreds of thousands of, of images to, to learn.
so I really think that we're gonna see like really more and more, different, initiatives to, to, to try to reduce our needs, on the sheer amount of data, on the computation that is needed. And even like, obviously change a little bit of the architecture. It's not gonna be transformers forever I believe.
Louis-Francois Bouchard: I agree. And for the, the data, we also need something that is of. [00:15:00] Super high quality versus just versus just scaling and having way more data. And so what is the importance of leveraging humans to annotate data versus having something that is using another machine learning model or something automated?
Where does the human come into play and why do we need them?
Jerome Pasquero: Okay, that's an excellent question. It's one that I get asked quite often. And the way I try to frame this is quite simple. machine learning or AI. Is really about, basically figuring out a way to download human knowledge, human skills, today into data that is used after for training a model so they can reproduce these, human skill, human skills or, or be able to, to generate that human knowledge.
Right? so it's kind of a. Yeah, an output from humans from the human brain to data that again, then it's fed to the models and that's how models learn. So a [00:16:00] model can't learn on its own without that data that is generated by humans. It's not going to learn anything new. You couldn't. reshuffle, repackage the data itself or what a model has learned, to export it and have it output in a different format.
But there's not going to be any net new information if there's no humans kind of injecting that information into the systems and, you know, the jury's still out as. It's how long it's going to take for models to reach a general intelligence, like a AGI. But one thing's for sure, it's not happening tomorrow, which means humans still have a big role to play.
Now of course, the counter argument to this is that As we move away from supervised learning, which is where like the database actually labeled by the humans and we move into stuff like some supervised or unsupervised. So large language model or models are often unsupervised. It uses like data that already exists and and doesn't need to [00:17:00] be labeled.
maybe, maybe we can go at an accelerated rate, didn't I? Transferring that information from, you know, human knowledge to, to, to models. But I think that, there's a fallacy in that, in that thinking that, you still need somewhere a human in that, in that loop. To say whether what's produced by the model is correct, because you can't have a model assess itself.
It's, it would be doing it in, in the void without any, grounding on in reality. And, and we are as humans that grounding in reality, we are the ones who can supervise, tell them, tell the models whether, whether, they're doing the right thing or not, and, and, and, and, and make the appropriate corrections so that in the next iterations they're better and better.
Louis-Francois Bouchard: And, and so what are your thoughts on constitutional AI, where like you, you give guidelines to a language model and it gets corrected by GPT 4 or whatever, it's kind of like a reinforcement learning [00:18:00] with feedback, but AI generated feedback versus you and one, do you think this is promising or it goes against downloading human skills.
Jerome Pasquero: I think it's a tool that has a role to play. So using ML to do ML is definitely a way to accelerate, accelerate certain things. But ultimately, you still need humans to be the ultimate judge. You still be human, need humans to inject more knowledge into it. You still need humans to supervise the whole, the whole thing, to generate the objectives and the goals and to, extract value that.
ultimately only makes sense in a human world, right? so, again, it's just, it's just, it's just one out of many tools, like ML for ML, basically, that, we have at our disposal to, to help that, that transfer of information from our brains to, to models.
Louis-Francois Bouchard: Could you, could you give [00:19:00] some specific examples on how you personally use ML to, to build other models or to build data sets?
Jerome Pasquero: Yeah, for sure. So for instance, one of the big problems, I'll, I'll stick to computer vision, right? Yeah. one of the big problems is that there's a lot of data out there. and, when our clients come to us, they have a tendency to kind of like dump all their data on us and say like, okay, can you annotate this?
And then they have a very specific instructions. So far so good, right? The problem is that they've never really looked at their data, right? themselves. It's too much. There's too much of it. and, and sometimes they're also looking at some specific things. So an example, for instance, they might be looking at, annotating about many things, cell phones and images.
All right, they're looking for cell phones and they want to annotate, but they have no idea how many cell phones are in all of the, the images that they've given us to, to annotate and they want to prioritize this because their model is [00:20:00] not what they're doing very, very well on self. So we can use ML to guide looking for cell phones in all the images they've given us with a high likelihood. It's not a perfect science, but we can use ML to help us find those images that are most likely to have a cell phone so that after that we can annotate, like label that cell phone in whichever way our clients want, and increase the speed at which we get to, you know, the, the right data for, for training your model.
It's something that if we weren't using ML, we would just use brute force. Look at every image, try to find cell phone in every image. It would be a huge waste of time. We might be even annotating similar images that are not needed. Imagine like you have twice the same image. We would annotate it twice.
Whereas here. Using ML, we can make sure that there's always kind of like, even if it's an image with the cell phone, it's an image that has a lot of value compared to all the other [00:21:00] images that you have so far. And you're not constantly kind of like relabeling similar things that have very value when you're actually training, training the model.
Louis-Francois Bouchard: And are you using. Any language models to do anything in your case, the new ones or the better ones?
Jerome Pasquero: That's a very good question. So, you know, we talk about language models, but like very soon we'll be talking about multi modal models. We're actually already talking about multi modal models. Yeah, for us, like for instance, I'll give you an example.
Instructions, for instance, instructions can be very, very long into the labeling process. There's no stamp standard for, for how our, our clients, provide us with the instructions for labeling. And that's a problem, right? Because every instruction is different. There's a lot of information to some of these documents have hundreds of.
Ages. That's where we could use a large language model to actually standardize that are the format of these language of these instructions, right? For instance, if we need clarification, like during training phases, we could use also large language models for providing examples [00:22:00] of, you know, If, if I see a bicycle and there's a pedestrian, a cyclist on it, should I annotate the cyclist and the bicycle at the same time as the same object?
Or should I do them as two different objects? So that's where a language model might be able to, to help. So yes, of course, we're, we're very aware of, of, at some, of, large language models. we constantly experiment with them, try to see how, You know, it, it helps with our ultimate goal, which is to provide the best, uncluttered data possible, at the cheapest, cost to our, tech lines.
Louis-Francois Bouchard: And you are mainly using it as a tool. Again, I get to, to download human knowledge and make it like more efficient instead of like basically automating something that. Humans could do, and I really like how you, you are focusing on, it's not an intelligent thing that will replace us. It's like a tool that we [00:23:00] can leverage.
Jerome Pasquero: Yeah. That's the concept of like kind of, augmentation is human augmentation. And it's, it's something that's really debated these days, whether we can use AI as, you know, means for human augmentations, people are actually, some people are pushing back. I fundamentally believe that that's the way to do it.
I actually. I think that, since the dawn of time, like whenever we introduce technology, it's in a human, reality, human context, which means we use it so that we can augment our capabilities. Of course, part of this means like automation in certain areas, complete automation, right? There's, you know, and some of the jobs disappear that way, but Ultimately, it's all about making ourselves better.
So we can move on to the more challenging problem that can't be solved today. So we used the exact same philosophy. we very well know at, at Sama that if all we're doing is throwing bodies at labeling stuff that [00:24:00] models are, can already do, it's a losing proposition for us, right? We'll rapidly be, replaced by those models.
So we always have to be one or two steps ahead of what the models can do. and, so that we con continuously provide a value to our, our clients. And that means in order to to, to stay in that race where we're ahead, we use the existing tools, so the existing models to, to, to actually, achieve our, our goals.
Louis-Francois Bouchard: Hey, this is a quick interruption to remind you to share the podcast with a friend. If you think the discussion was interesting or insightful, sharing, it will make it so your friends. Think more of you. It's a two in one to share knowledge and also be seen more interesting. So please don't forget to share this episode with anyone you think will take value out of it.
Thank you for listening to the episode and I will let you enjoy the rest. You mentioned that AI will have an impact on jobs and. Basically just like all technology changes, [00:25:00] it had impacts on the job market. I wonder what do you think will or is happening to the current jobs or like just our society because of, for instance, language models, but just AI in general, is it mostly taking jobs for people?
Is it creating new ones or just transforming? Like what will be the impact you think?
Jerome Pasquero: I don't have a crystal ball, but I would say that. The past is probably something we should look at if we're trying to, to, to predict the future and any transformative technology, and I think AI is just one of them, has, come with like.
A huge replacement of some jobs and an addition of new jobs. Unfortunately, it's not necessarily the same people who are doing the new jobs and the old jobs, but ultimately most of the time it's a, it's a positive outcome. Like there are more jobs that are created that are destroyed or become [00:26:00] obsolete.
I'm hoping that'll be the same thing with AI. I think humans are very creative in finding ways. of tackling new challenges again that the current technology can't tackle on its own. And, I see no reason why this, would stop just because now we have, you know, large language models. So it's going to be disruptive and super disruptive in certain fields and super creative in others, right?
Louis-Francois Bouchard: And so you don't see much difference in previous revolutions versus now.
Jerome Pasquero: One definition of technology is that it's going to be more disruptive than a previous one. It's going to have a larger scale. It's, it's how it is, right? So, so we shouldn't be surprised when a new revolution, a technological revolution, it seems to be, you know, an order of magnitude bigger than a previous one.
But every time we're in those revolution, we always think, Oh, this is crazy. This is it, right? but there's no evidence then that that we're at the end of [00:27:00] that, of that line when computers came out, people were like, well, I don't, you know, I don't need, we don't need computers, which by then were back then where were people actually doing all the computation when mobile phones came out, people are like, Oh, well, now I can talk to anyone anytime.
Like, so, so this is going to continue. Of course, every time we're in one of these revolutions, we should be careful, we should be mindful that it's going to come with like a bunch of what's going to create losers and it's going to create winners, and we have to be very careful about making sure that wealth and knowledge is distributed equally and every time it becomes an extra challenge.
But every time we're also equipped with more technologies, more means more, historical data to make sure that that doesn't happen. So I hope I, I don't like, you know, it's human nature to always think that they're really special and their time is really special, like the time they're living is really special, but there's no, if you take a very objective approach, there's no reason to [00:28:00] think that we're different than, you know, 50 years ago, a hundred years or 500 years ago.
This being said, I'm super excited about LLMs. It's one of the coolest things I've tried over the last few years, right? And I think it's great to be embracing AI and to get excited and to push the envelope. And I think it's just as important to be mindful of, like, its dangers, the pitfalls, and the fact that, like any technology, it's a double edged sword, and we have to be careful about how we use it, implement it, and distribute it.
Louis-Francois Bouchard: But if the goal is to ultimately, just like some man open AI or other companies is to create AGI, what is a future technology that can be that, like, what can be even more impressive than creating something that is intelligent enough to do everything on its own and?
Jerome Pasquero: Yeah, again, that's a good question.
It's hard to know what you don't know, right? Or say [00:29:00] hard to speculate about what you don't know. I think that my main question is like, I'd say I think it's pretty obvious that we're trending towards AGI or even more superintelligence. Okay. I think everyone agrees with that, given enough time and resources.
And if we don't auto destroy ourselves, we'll get there, right? The real question is, how long is it going to take? People think that we could achieve this in 20 years. Others think it's going to take 200 years. I'm just saying, I don't know at this point, right? And I don't think that anyone who says that they know knows better than like has any more information than we do.
Right. So, I wouldn't panic about this. Right. and I think once we get there. The human reality will be slightly different in the sense that maybe we'll be an interplanet, a species, and then we'll have another set of of challenges. Yeah. So if we're going into more kind of into the future, my, my main point over here is that, [00:30:00] There's no reason to be, in a panic mode right now.
we just have to be careful, like we've always had to be careful in the past. And maybe, maybe we do have to be a little bit more careful. But again, we, we have experience that we didn't have before. We have tools that we didn't have before. So let's, let's leverage those.
Louis-Francois Bouchard: Yeah, I assume it's just one step for, one step further towards AGI as we've been, as humans have been trying to do since.
Like automating things, even prior electricity and internet. So it's just always trying to. Automate tasks that we don't want to do.
Jerome Pasquero: And I mean, when the first computer beat, you know, the, the, the chess masters, people were like, this is it. Like we've solved AI, right? And then they were like, well, actually there's more to it.
And every time we solve a problem that we think is only of the realm of humans, there's always kind of a challenge after that, right? that arises [00:31:00] that people try to tackle. So I don't see an end to that. until really, yeah, we, we solved, you know, super intelligence at a very big scale, which I think will happen at some point when, I don't kNow.
Louis-Francois Bouchard: And I want to go back on, on, that had to build those models and how do you deal with.
Qualitative tasks. Like for example, it's easy to, to spot a cat in a picture. You just have a bounding box and it's clear and simple. But do you, do you deal with any qualitative tasks where it's much more complex and it also depends on who's annotating?
Jerome Pasquero: Yeah. So. I would say we deal exclusively in computer vision where there's a little less ambiguity in the tasks that we are asked to perform in labeling.
I'll give you an example in large, in an NLP, a natural language processing, for instance, same sentence. Can have different interpretation, depending on, you know, [00:32:00] cultural vices or, or even where you are in the, in the world, right? In computer vision, a cat's a cat, right? This being said, because we are now kind of, going up in the ladder of like cognitive tasks that we're asked to tackle, there are sometimes now ambiguities that arise, for instance, asking whether in the image, there's gonna be an accident.
You have an image of cars and like you have to come up with interpretation of what's gonna happen next, for instance, right? And whether it's gonna be an accident or a near accident or stuff like that, that is a little bit more open to interpretation. But, that's okay, right? I think we've been operating under the impression that there is a model that will serve everyone's, that's going to align with everyone, right?
The concept of alignment. But that's not true. Like, why should there be, models, one model for all, whereas we're all different as individuals. The reality is [00:33:00] alignment is really at the individual level, right? And in order to do that, we do need first to capture all the nuances of how humans can interpret the, the reality, so yeah, we do see some, some examples where, we will ask multiple annotators to come up with an answer and knowing very well there's no real, like, hard ground truth and use that distribution, to, to have a model that's, a model that's a little bit more, nuanced, what's, what's its trait.
Louis-Francois Bouchard: Are you dealing with biases the same way? For, for example, if there's definitely a difference between annotators, even if it's a somewhat simple task, one might do like a slightly bigger box all the time versus others might be more tight or, or whatever, are you, are you dealing with either multiple annotations and doing averages, or you are always trying to use the same person or just using as many people as possible to [00:34:00] represent, like.
The world as more, most accurately as possible. How do you deal with that?
Jerome Pasquero: It really depends on use cases. So in some cases we'll use one annotator per task. and then, you know, the, the averaging process that you describe over here will happen naturally. So say we're both working the same project, you see an image and you have to annotate.
A face. You might be very aggressive and how you annotate the face and the pixel tolerance make it very, very tight. And I'm a little bit loose because I include the hair, you know, coming out of the top of the head over here. And I'm a little loose. But if we do this with like hundreds of annotators or thousands of annotators, the signal is going to average itself.
Right? Right. So we don't really need to deal with this. That's one way of doing it. In some, other, use cases or, or, or, or works, workspaces, workflows, sorry, we'll, we will have, multiple annotators [00:35:00] doing the exact same, same task because there is an inherent ambiguity that we really want to try to capture and model or that our clients want to model.
We're injecting to the into the models. So, and then there's like anything in between, right? There are cases where we'll use both techniques at the same time. Another important thing here is, is the fact that the annotation process is a, is a sequence of steps. So here we're just describing a single step.
And in some cases, you can have multiple steps, that follow each other. And, And one of the final steps for us is usually quality assurance, where we have our most experimented, experienced, annotators who look at the, the output of what's been done in their steps before and decide whether it's ready for delivery or whether it needs to be corrected, in which case they might make the correction themselves.
So they might like readjust for bias at that point, or they might just reinject it into the whole workflow so that it gets [00:36:00] reworked with like additional instructions and guidance and all these like the which technique to apply to which use case is our expertise. That's what, you know, we're good at.
That's what we advise our clients on because we've been doing this for forever.
Louis-Francois Bouchard: Would you have a specific example of, when you, you are using only one person or very few one and not averaging?
Jerome Pasquero: Yeah, I'd say for the enormous majority of the cases we use, one person and then we use QA, right? But for instance, one use case where we might use multiple people is, When trying to assess whether, data that has been synthetically generated by our client is realistic or not, so they might send us, it might be a company that, generates, synthetic, people in certain poses, right? So we're asked to look at all the data that they've generated and say whether this is.[00:37:00]
you know, emotion or a pose that that is realistic, whether the face has any flaws or not. So it's kind of open to interpretation, right? Because it can be difficult sometimes to say, Oh, this, this is a little off, but I'm not sure why, or this, this is actually not realistic. If I saw this image, I would think that it's that it's possible.
and that's where, like, we would have, probably multiple people. Yeah. When we annotators look at the same at the same data and, and label it, and then what's important at the end is not necessarily to do the average between all these answers, but to really capture that distribution, right? Cause that distribution is, is, is an important part of information that, any, any model would, would benefit from, from having afterwards.
Louis-Francois Bouchard: You mentioned synthetic data and do you see, I believe. that you are not often using them or creating them. Correct me if I'm wrong, but I think I've heard [00:38:00] you say that you, you are not currently using many synthetic data. So I wonder what's your current opinion on the state of automatically generated data just to grow the data set or to generate for instance, like rare use case and generate more of them.
Jerome Pasquero: Yeah, no. So, we We do get to annotate synthetic data. Some of our clients have synthetic data that they want us to validate. Basically, that's a validation workflow, right? my view on this is, is pretty classic. I think, synthetic data in the, training of model has a role to play. It's one, a tool out of, of many tools.
It's, it's a useful tool, especially to kickstart a model when you have no data. But I still think that there's no, replacement for real, hard data. Right? And it's it's it comes down to the same thing we were talking about earlier. If you figured out how to [00:39:00] generate a realistic and by realistic, I mean, not realistic just for human, but realistic in the sense that it is mimicking real life data, you know, very, very accurately.
If you, if you figured out how to do this, it's a synthetic data generator. Company that does that, you figured something about the world that actually though model is trying to figure out itself, but in the opposite direction and reverse engineering this from, from the data itself, right? it also means that, you don't necessarily need to use annotated data or to annotate data, right?
For, for this, because you've understood something about like the physical world that otherwise. You would need a model to reverse engineer. So, it comes down to again, how much new information by generating his data, are you actually creating into the data that's generated? And most of the time it's just very little, again, it has a role to play.
You [00:40:00] mentioned. you mentioned for the edge cases. Sure. Okay. But in order to know that these edge cases exist, it's still good to use the real data edge cases and maybe augment that real this real, these edge cases. you can try to come up and be creative by trying to think about these edge cases yourself, but that's going to be difficult because the whole challenge of edge cases is that most of the time we don't know that it exists until you see them.
And something that you could use synthetic data for, as mentioned before, is also to kind of kickstart the training of our model. If you have absolutely no data, it's a good idea to just get a really basic benchmark, start from there, and then you can improve on it with a combination of real and synthetic data.
Louis-Francois Bouchard: You validate a lot of synthetic data, but also do quality assurance on the annotated annotations you have. I don't know if you are using the same process for both, but how can you scale up this quality assurance [00:41:00] and this validation for many, many images?
Jerome Pasquero: So it's really a three step process. The first one is about selecting the data that is going to be, seen, validated, analyzed by a pair of eyes, like by our workforce, right?
Automated system, basically data curation. The tail end of that data curation does involve a human, just for the last steps, because again, that's where a human will be the best judge of, what makes sense to annotate or not. But that's the first phase. Once you've identified the data that is worth analyzing that is worth, annotating or is worth validating.
Then you go to the pure annotation and validation and correction phase, right? That's where we have the most people. that's where the mass is looking at it and, and trying to go through as much as possible. Again, of that, like, really filter down, data that you had as a, as an input to the whole [00:42:00] system, right?
and that's where you're injecting, again, a lot, information from, from humans. And then a third part of this is really the quality assurance. Now you don't need the same ratio of, people who are doing quality assurance than, You don't need a one to one ratio between these people and the ones doing the annotation because quality assurance can go a lot faster.
As I said, it's usually the most experienced annotators or data enrichment workers who are doing that on our end. but that's where like they kind of just readjust a little bit, re send for a annotation or for rework some of the stuff that is not right or correct themselves. So these are the three phases in some way are always kind of informing.
The, what is coming downstream from them.
Louis-Francois Bouchard: Other than, than the example you just mentioned when, when is AI or any machine learning tool better than humans? Is it only for scaling up things or does, does certain machine learning algorithms [00:43:00] have better capacities than humans other than doing like. Many more iterations, faster.
Jerome Pasquero: Yeah, so two, two answers to that. Machine learning algorithms are better than humans once they've got to that level of being better than humans. So today, for instance, if you, if a client comes to us and say, I would like you to annotate dogs and cats from, and cats, I would say that's nuts. That's a problem that's solved already.
Why are you wasting your time and money on this? Right? so classification problems, trying to like, we know machine learning algorithms are really good at recognizing faces too, right? For instance, so there's a, there's a long list, but not an infinite and not a complete list of problems that today we could consider being solved by computer vision, a big part of it, most thanks to, deep neural networks, right?
yeah. So I would say that's, that's, that's where they're better. The second thing [00:44:00] where I wouldn't say they're better, but they allow you to scale is when you have enormous amounts of data and you can't look at all the data, then it would be better to have a huge workforce. An infinite workforce to look at that data, but it's also not practical, right?
Be super expensive, super slow, but the output would be better. But since you can't, you might as well use machine learning to at least do a first pass at looking at the data, identifying some patterns, classifying or ranking it in a certain way that after that can be looked at by humans using this first informative, metadata that was computed from machine learning.
but again, like, I would rather have a huge workforce if I could, and if it was, practical, because there's a lot more knowledge in that than the, those machine learning algorithms that we used to go through. So what we found works at Sama is a combination of [00:45:00] the two. it's a good idea to have a lot of data.
Start with that. Start with some of, you know, the models that allow you to, go through that data, maybe like sort it or rank it in a certain order, and then have humans look at, part of what the analysis of that first phase gave you. to, realign it to and then to make the ultimate decisions around what should be annotated, what shouldn't be annotated, what should be looked at again, what requires clarification and everything, before it actually goes to the next stage, which is the pure annotation or validation phase.
Louis-Francois Bouchard: And do you see a reduction in the needs of humans in the loop by the evolution of AI just being better and better?
Jerome Pasquero: So I believe we're going to need less humans per loop, okay? Because there's a bunch of automation, a bunch of other tools, including ML tools that are going to make this work faster and more scalable.
But I also believe there's going to be a lot more loops, [00:46:00] right? So the ratio Definitely changes, but the sheer number of use cases, the sheer number of work, workflows or use cases, if you want, are going to explode where you might need now one human in a loop kind of supervising or even teaching 10 models at a time, even 100 models at a time, but then you're going to have, like, It's hundreds of thousands or millions or tens of millions of models to actually, supervise that one.
So, ultimately this sheer amount of, of people that you need, actually probably grows, but, but they can do a lot more than what they're capable of doing, today.
Louis-Francois Bouchard: And you mentioned use cases and I, I'd love to talk. To you about a specific use case that you mentioned to me earlier that you were, being involved with, which is the training of autonomous vehicle or building the datasets for it.
And for lots of people, autonomous vehicle is [00:47:00] something that is becoming more and more real and impressive. And it's like kind of the. Terminator of AI. Like it's what we see as like something, I don't know, real use case, a clear, real use case of artificial intelligence. And I wonder if you could share a bit more on how is such a system trained?
Like what is the data required, the kind of, the kinds of data, and if there are multiple systems or multiple type of input that, that it needs to be trained. So basically what data does an autonomous vehicle needs to be Autonomous
Jerome Pasquero: sure, we can talk about this a little bit, but that this is a topic that could spawn over like multiple podcast hours.
Yeah. yeah, in essence, there's really 2 parts to an autonomous vehicle working. 1 is the perception. part, which is collecting and analyzing all the data that goes to through the typical sensors you find in a car. So what [00:48:00] is those typical sensors? you have cameras, right? Like, you know, Tesla only uses cameras.
You have cameras and a set of cameras to look at the world around you. But there's also, something called the . which is another type of sensor, based on the on light, which, instead of getting, creating a 2D images of pixel actually generates, three, 3D world, of. what we call points.
So they're typically called point clouds. You can also use radar. You can also use other technologies, but let's stick right now with 2D cameras and LIDER. So, that is the information and sensory system that is used and then processed for, making decisions on like, you know, For at least understanding the world around.
Are there cars around? Are there obstacles? Is there an imminent, imminent danger? How close are the things around, the ego vehicle, etc. And then a second, [00:49:00] part of the system is the decision making system, basically, the planner, right? Now that I know all these things, how do I get where I need to go?
Immediate, like immediately and on a long term, like I might want to get over there, which is pretty far away. But what are these different steps that I need to take knowing what I know at this particular stage and knowing also that the world around me will change as I get closer and closer to that more longer term objective of getting to that, that other, To my destination, right?
So those two things are usually separate when you're actually tackling autonomous vehicles. I'm not a pure expert in that domain, but I do know that in these two phases, there's also a bunch of sub modules that are used, right? So think of like at the lowest level, there's Just being having a system that's just able to identify pedestrians because pedestrians are very important, right?
You need to identify that. And [00:50:00] then other levels of abstractions around this. maybe you have something that does the, the, the, the prediction of where the pedestrian is going just to make sure that the pedestrian is not walking in front of your car. And the chances that it will walk in front of your car anytime soon are very, very, very low.
Right. Those are the things that are built on top of each other. And that's how the industry usually works. Now, for us, what we're seeing in AV autonomous vehicles that we're not seeing yet in other, in other domains and other fields is that use of the 3D point clouds like those lighters. The reason being because lighters are still a very, very expensive.
and what is fascinating for us is that annotating light hours is actually a very difficult task. Imagine like now we have our annotators, they need to understand our three D world. It's no longer looking at an image. It's long looking at a three D world, which is very sparse. You [00:51:00] have points. I don't know if we could put an image at some point just to show what it looks like.
You have points in that three D world that, don't necessarily, not necessarily easy to recognize as a, as a shape right as a car. And yet. our annotators are asked to put a cuboids, a three dimensional shape around those different vehicles around the different pedestrians with very little clear information for humans.
So what they do is experience matters a lot over here, training matters a lot, and they will use the 2D images from the camera sensors to guide a little bit, their, their annotation, their annotation process.
Louis-Francois Bouchard: Do you think using only cameras and LIDER is enough to download the skills of human drivers?
Or is it promising and do you see it going on the roads?
Jerome Pasquero: I think at some point we'll get there. The exact combination of sensors that it will take, I don't think we figured it out yet. Okay. We've [00:52:00] probably invested more like collectively than a hundred billion dollars in that. And yet how many autonomous vehicles do you see driving around, especially you and I live in Montreal, there's a, we got about 12 inches.
No, yesterday. I really don't see any of these like fancy California style car cars that are driving in San Francisco being able to drive today so that we have a long, long way to go. Still, which sensors we will be using. I, I believe that will be a combination of things. If you. Listen to Elon Musk and tell you that two cameras is enough one for one to represent each eye of a face.
Right. I don't think that's necessarily, necessarily going to be the case for, for the foreseeable futures. Just the same way that like planes don't fly like birds do at all. And yet they work. So I think LIDER has a role to play because it's an interesting technology. They're still too expensive.
Unfortunately, as [00:53:00] the price goes down. we might see like vehicles worth not one or two or three LIDERs, but now a dozen or a half a dozen, and that might help for redundancy. and then, yeah, other technologies, such as a radar can help as well. And then there are probably others that, that I'm not mentioning right now.
Louis-Francois Bouchard: And this may be a bit far fetched and out of topic, but I just also wonder why we are trying to create autonomous vehicles whereas instead we could create autonomous subway stations or whatever that is much simpler and could be much more efficient than each having our own car that is trying to replicate human driving skills.
Jerome Pasquero: I have a very personal opinion on this, so, you know, please take it with a grain of salt over here. I believe to come up with a revolution in mobility, which I think is a noble cause, where I'm not sure I agree is [00:54:00] with people saying that We're trying to get autonomous vehicles to save lives because between you and I, if we invested 100 billion in trying to save lives on the roads, we would have, like, come up with a much more successful means of doing so, right?
limiting speed, limiting people from like a forbidding or like, not forbidding, but, stopping people from, from driving drunk with like very, very simple technology. So that I don't think is a really good, motivation. Well, at least one that I don't really, buy, but I think we're doing it because it's a really interesting technical challenge.
Like JFK would say, like, we're doing it. Because it's hard, like, you know, it, we're not doing it because easy, but it's because it's hard. And I believe there's a lot of value in this. Eventually, I think it will, it will, come with a, a, a set of advantage in, in, in completely changing, how we see and, and use mobility.
But that also is gonna take a while [00:55:00] before. Before we get there. So, so that's, that's my opinion on this. from a pure technical point of view, it's really, really cool. Should we not do it? No, we should absolutely do it just like we should try to go to the moon and beyond. but are we really actually tackling those problems that a lot of us say we're trying to tackle?
That's where I call a little bit of BS, at least on some of the people that have heard talk about this.
Louis-Francois Bouchard: I have a final question for you to go back a bit on more applied and realistic use cases of AI, just to finish with a loop. And since you are like the vision expert and mostly dealing with what our eyes see, could you first give some real world example of where AI is currently being used in the vision industry?
And second question, second part of the question is, where do you see it coming into our lives that isn't? Yet helping us.
Jerome Pasquero: Yeah, for sure. So just as a note, like as a funny anecdote of a year, you describe me as a computer vision [00:56:00] expert to revision expert. My background is actually in haptics, which is a sense of touch with computers, which is a completely different sense, which I'm happy to talk about at some point.
If you if you want to in terms of computer vision, like, so what are the current applications? It's pretty much everywhere, right? In security, for instance, for tracking, like, peoples from one, one frame to the next, or from one room to the other. computer vision is also used in, in stores that, where there's no cashier at, like, the checkout, the self checkout, stores, for instance.
on, in manufacturing, on assembly lines, to detect. flaws on whatever is going on the on the assembly line, right? there's so there's there's really applications in every, in every single domain on every single vertical. what, where are we going to see some of this, in terms of net new in the future?
I mean, I also think everywhere. [00:57:00] So anything, anything where, having a vision system that is automated. Good. can benefit the business, we'll start seeing, these, these systems, appearing. so, and I know like this is kind of a vague answer, but I think it's just so broad that, I leave it to the, imagination of your listeners to, to come up with a real application.
I'm sure a lot of them are already working on some of these.
Louis-Francois Bouchard: Yeah. It's just that, for example, if we take my mother, for instance, she doesn't. know a lot about artificial intelligence and to her, it doesn't exist yet really. Or like she's not using it. Whereas every time she picks up her iPhone, it like lots of things are, are using artificial intelligence.
And so. I love to have more concrete examples of what's currently being used. And of course, security was a good example. But do you think you can quickly find out one that is not [00:58:00] currently possible, but you think will become possible soon?
Jerome Pasquero: in terms of like some other examples, let's start with that.
Right. And again, it really depends on the definition of, of artificial intelligence. What we call our AI today was not AI, like, you know, 10 years ago, or sometimes a month ago, it wasn't called. Yeah. But, you know, like your iPhone right now, it unlocks itself by recognizing your face. Some people could call this artificial intelligence.
So there is a lot of tricks that involve with this, not pure intelligence, right? But like, just the fact that it recognizes you and you don't need to put your password. And that's, that's, that's definitely one example. if we go back, like, if your mom listens to Netflix, right? The recommendation system on Netflix is also.
at some point was called artificial intelligence. There's nothing modern, something that had been existed in existence for a long time. But this idea of being able to, based on your past viewing and your past scoring of, of movies and, and others like others scoring as well, being able to, to, to recommend, what you should be [00:59:00] watching next on your streaming, service, for instance, those are all examples.
So, so it's pretty much. It's pretty much everywhere as if what's coming. I mean, that might be, that might be trickier, but you can think of like waking up in the morning, for instance, in your house, and, and everything is adjusted to your liking. So obviously it would require. A change in hardware in your, in your house.
But let's, let's, let's pretend that this is happening, right? You're greeted in the morning, you go down, your coffee has been made and it's made differently on Saturdays than it is on on Mondays because you have more time to drink and it's different. And, and you, you haven't had to configure any of that stuff.
It's just learned from your preferences. It's just making suggestion from what type of person you are in which category you fall. And it's using data from other people in an anonymous way, but just to try to find [01:00:00] what would suit you best. You know, these systems might or might not have vision in the case that I just described, you probably don't need vision for this, but it's.
You know, the same concept applies to, to, to vision systems, just as well. I think
Louis-Francois Bouchard: it might also have haptics. I, I've actually listened to you to a podcast that you've been in where you talked a lot about haptics. So I didn't want to like repeat and talk about this again, but it's definitely something super interesting.
Another sensory input that I believe we don't really use that much. these days. And so, yeah, I, I would love to just quickly touch this topic where, could, could you give a bit more information on how are haptics used or what's, what's challenging with this sensory input and its link with its potential and link with artificial intelligence.
Yeah. So first
Jerome Pasquero: I appreciate the point of like touching that topic. there's, there's so many puns we can do with haptics, right? Yeah. So, you know, I was [01:01:00] just listening to Fei Fei Liu, who is a leader in, in, in computer vision and artificial intelligence professor at, at Stanford. And, in one of the podcasts I listened to, she was talking about how vision is probably one of the reasons.
And when we, when animals evolved that, well, like God vision, that's the sense of sight. Right. I probably explained how like. There was a huge jump after that, in, in evolution and I kind of disagree. I agree that vision is a super important part of our lives and cornerstone story of what we do. but I think it's just because it affords for a lot of information to be, as a sensor to come in, right? A lot like all these pixels, if you want, that are being processed. But I actually think that haptics is just as important, if not more important, but because before you need to see, you probably need to feel to know what's around you. And that's what the sense of touch is, right?
being able to, not only, understand your immediate environment around you, but also [01:02:00] understand. Yourself without haptics, you wouldn't be able to stand right now. All your proper assistive system, which is what allows you to feel your own muscles, your own body. If you didn't have that, which is the sense of touch, well, we will all be on the floor and not moving at all.
Right? so, the, the, the, the big difference between. I guess the sense of sight and the sense of touch is twofold. One is like the sense of sight allows you for like remote processing. You're seeing stuff that's very far from you. Whereas sense is like, the sense of sight is more, touch, sorry, is more about your immediate surroundings.
Because if you can't reach it, you can't feel it, right? And the second thing is the Processing the amount of information, like from a pure kind of like number of bits, like that come in, the sense of site, like is, is a, an order, two orders of magnitude bigger, if not more than the sense of touch where, you know, obviously you don't get a lot of information coming to your, [01:03:00] your, your skin, but that information is just as important as we said, you wouldn't be able to stand if you didn't have it.
And it's also much more, I think, loaded with emotional content. All right. We touched babies. We touched our loved ones. So those are the two differences. But, maybe we keep all this topic for another time, because as you can tell, I rapidly get excited about that.
Louis-Francois Bouchard: And just for autonomous driving as well, for example, if we take yesterday, where there were black ice everywhere that we cannot see, but if we feel it, we, we, we hear, both hear and feel the, the, the gas pedal going farther, but the, we hear the, the wheel spinning and just starts.
We start feeling that the, the car goes going sideways and it's like something that cannot be only done with two cameras.
Jerome Pasquero: No, I think another great example of this is in, in, surgical, procedures, [01:04:00] especially remote surgical procedures, right? we now have like machines that allow you to do, surgery, from a distance remotely, right?
Someone is in a hospital under the machine. And as a surgeon, you're in a completely different city in another. hospital. It doesn't actually need to be a hospital, but like say a command center where you're actually doing the surgery remotely, using a vision alone won't be enough. You can see it, but the surgeons are going to ask like, yeah, but I want to feel what I'm cutting because otherwise I can't actually navigate.
I don't know what I'm doing. So, this idea of using multiple senses, is obviously not new. That's how we operate as humans. And that's also why I think we're going to see. In some sense, more and more of these multimodal systems that kind of combine multiple modalities because there's a redundancy of signals in there that makes things more reliable and more efficient and of better quality to make natural progression in the next [01:05:00] phases of AI.
Louis-Francois Bouchard: And I assume that the diverse, the sensors and the more sensors we have, the better, just like, for instance, maybe a car could be in the end, in the future, better than us, just because it has LIDER and other types of sensors that we don't really have. And yeah, it's just really exciting to, I hope to see how things coming.
more and more into our lives. Just, I guess we do have some applications that, that are using it just with our phone and other things. And I also worked at, CAE Healthcare, during my internships and I was, testing and doing QA for the, some surgical applications where you had to cut tissues and feel you, you were literally like feeling cutting through meat.
I don't know, but like it, it was quite realistic and it's. It's a very cool field to try trying to replicate. Tissues or just textures, it's, it's something, yeah, [01:06:00] incredibly cool.
Jerome Pasquero: Yeah, exactly. And in VR, we're going to see this more and more. People are going to be asking like, Oh, of course, today you feel like the, like what a gun feels like when you're shooting, like in a game, for instance, like a VR game and stuff like that.
But that's really just scratching the surface is what we can do by trying to reproduce the sense of touch in, in a virtual, virtual reality.
Louis-Francois Bouchard: Amazing. Thanks. Thanks a lot for your time and just all the amazing insights you shared. And I would be super excited to, to talk more about haptics sometime soon in the future if, if you are too.
So yeah, thanks a lot for joining me on this discussion and sharing all that you have in your mind.
Jerome Pasquero: It's been a real pleasure. It's been a lot of fun. Yeah. I'm always ready to talk about haptics whenever you're ready. Awesome.[01:07:00]