Building LLM Apps and the Challenges that come with it: An Interview with Jay Alammar

The What's AI podcast episode 16 with Jay Alammar.

Building LLM Apps and the Challenges that come with it: An Interview with Jay Alammar
Full interview transcript below!

A new interview fresh out of the oven with Jay Alammar, a renowned technical blog writer and AI educator, author of the famous illustrated Transformer blog post. Jay is probably the most known tech blogger in the NLP space. In the interview, we dive into his passion for machine learning and NLP and the importance of sharing knowledge to help others in their learning journey.

Jay's blog and work with Udacity on nano degrees have helped educate countless people in machine learning, deep learning, and NLP. He shares advice on writing while learning and seeking feedback, which can help open doors and boost careers.

Jay is currently working as Director & Engineering Fellow at Cohere, focusing on the educational platform: LLM University, which we talk about in the episode. Jay and his team behind LLM University strive to make learning accessible to all. Their work on large language models democratizes knowledge and provides learning opportunities for people from all around the world.

In the interview, I asked Jay to explain the whole training process as well as the architecture behind the GPT models. Tl;dr: they answer questions by generating text one word at a time, constantly updating inputs to generate outputs, and breaking down complex concepts into simpler components. Learn more in the episode!

If you're interested in diving into Large Language Models and building AI-powered apps, don't miss this enlightening interview with Jay Alammar.

Listen on Spotify, Apple Podcasts, or YouTube here:

Full interview transcript:

[00:00:00] This is an interview with Jay Alammar. Jay has a technical blog as well as a YouTube channel where he explains a lot of technical AI related stuff like his Attention and Transformer's blog that you definitely have heard before. And he has amazing learning resources. Now, he is focused on building LLM University with Cohere, and we'll talk about that in this episode. But not only, we'll also dive into the different challenges in building Large Language Model based apps and how to build them. I think this episode is perfect for anyone interested into Large Language Models, but mainly for people that want to create a cool app using them. I hope you enjoy this episode.

So I'm Jay. I worked as a software engineer. I've been fascinated with Machine Learning for. It's so long, but I really started working in it just about eight years ago when I started a blog to document how I'm learning about, about Machine Learning. It, it [00:01:00] seemed to be this power that, that, that gives software capabilities to do that's quite mind blowing if you know what the limitations of software are. And you see these Machine Learning demos coming and doing things that really stretch the imagination in terms of what is possible to do with software. So that's when I started my blog, which a bunch of people have seen use for because I, I started with introductions to Machine Learning and just in general how to do, how to think about back propagation and neural networks. And then I moved into more advanced topics covering attention and how language processing is done and language generation. And then it really exploded. And I, when I started sort of talking about Transformer models and BERT models and GPT models from the blog, I got to work with Udacity creating some of their under degrees for educating people on how to use these language models and how to train [00:02:00] them.

These were two things that really launched a little bit of how I work with Machine Learning in, in ai. The blog I've seen has been, has had upward of 600, of 6 million page views so far. A lot of it around Transformers and how they work. And yeah, I think maybe tens of thousands of people have went through the, various Udacity programs on Machine Learning, Deep Learning NLP, Computer Vision.

And most recently I get to work very closely with these models, with Cohere as director and and engineering fellow, where I continue to learn about the capabilities of these models and explain them to people in terms of how to deploy them in real world applications. So how can you build something with them that solves a real problem that you have right now?

And that includes education and includes talks. That includes creating schematics, but also like crystallizing a lot of the lessons learned in in the industry of how to deploy [00:03:00] these models and how to build things around them. Cohere. So a lot of people have heard about, about Transformers and Cohere was, was. Built by three co-founders some of whom were co-authors of the original Transformers paper. And they've been building these, you know, Transformers in the cloud. These hosted managed Large Language Models for about two or maybe two and a half years now. I've been with the company for, for two years and I've seen this, this deployment and brought out.

And yeah, since then the company has trained and deployed, you know, two families of, of massive language models that do text generation, but also text understanding and we can go deeper into these two capabilities of, of Large Language Models.

Absolutely. And you mentioned that you started eight years ago, your blog, and of course, as with most people, I discovered you through just the amazing blog that you wrote on Transformer and attention.

It really helped me and, and lots of people, but. [00:04:00] Since you started so long ago, what made you start and made you get into the field? Because eight years ago, AI hype wasn't really near what it is right now. So how did you discover it and what made you create this, this blog about it?

Amazing. Yes. So I had, working as a software engineer for a long time, sometimes I would come across some demos that to me, feel magical.

And a lot of them come from Machine Learning. And so if you, it's on YouTube, if you Google Words lens this was a demo that came out in 2010 of an iPhone four that you, you can point it to, let's say a sentence written in Spanish, and it'll transform that into, let's say the English translation. It will superimpose it.

And right now it's know, it's, it's 13 years later now, but that still feels like magic, especially if you like, know about software and how complex dealing with language and images is. To be able to do that. Without a server on software that's [00:05:00] running on a machine, that felt like an alien artifact to me.

And so seeing something like that I've always had this to-do list in my head of get into Machine Learning, find the, be the first opportunity to get into Machine Learning and understand because it's clearly gonna be transformative into, to software. The moment that really gave me the, the jump into it is around 2015 when TensorFlow was open sourced, it felt like, okay, this is the time now there are open source code.

So because, you know, a lot of these things you have to be very close to a, a research lab or you work sort of deep inside a company. And I, at the time, I'm like an outsider. I'm now, I wasn't in Silicon Valley, wasn't in sort of, you know, a big tech hub. I was just a, a software person with a laptop with access to the internet and just trying to learn on my own without sort of a group or a company or a research group around me.

Wasn't let's say an academic. And so a lot of it was like self-learning. So when [00:06:00] TensorFlow came out, I was like, okay. I read a paper in 2004 called Map Reduce and that launched the big data industry. So everything sort of around big data is a massive industry. Uh, And it's felt like, okay, the, the, the launch of tensor flow is like, and everything that started happening in Deep Learning is we are the beginning of this new wave of, of Deep Learning.

And so yeah, I started to take some tutorials, but then how do you feel satisfied if you spend three, four months learning about something? Yes, you have more information, you've developed a little bit of a skill, but like I always need an artifact that sort of solidifies that from one month one to month three, I have this thing, and that's what the blog was for me.

It's like, okay, let me. I struggle very hard to understand a concept. Once I understand it, I'm like, okay, there's a better, maybe an easier way for me to have understood it if it was explained this way. And that's how I try to sort of, let me chew on this. Once I understand it, [00:07:00] try to sort of, hide the complexity, the things that made me intimidated when I was like, I would learn something and then I'm faced with a wall of formulas, for example.

Yeah. Or like a lot of code before getting the intuition that made me feel intimidated and sort of, I'm very, that those feelings are sort of what guides me to have these gentle approaches to these topics to, to the readers where we hide the complexity. Let's get to the intuition, let's get a visual sense of the intuition. But it happens over a lot of iteration. And I'm happy to get into like the, how the writing and visualization process develops over time with me.

Yeah, I'd love to because mainly I do the same thing with YouTube where I actually started when learning artificial intelligence just to force myself to, to study more and, and learn more.

But also just like you, I wanted some kind of result or output from what I was learning and just to confirm that I actually correctly learned [00:08:00] things. It's so, because if you can explain it, it, it should mean that you understand it. And so I, I, I perfectly un understand you and I'm on the same page, same page.

But another way is also to create something or code something or, or develop an application or, or whatever. So what made you go the path of trying to, to teach what you were learning instead of using or learning something to create something else?

So because it's always like you, as you develop a skill, you're not always building it building that skill to build one product that you have.

You're learning and you're observing and you're seeing what is, is popular in the, in the market. And then as you develop your skills, maybe you. In month six, you're still not able to, you know, train a massive model, deploy it to solve a problem. And so like I, I decouple launching a product from the learning and acquiring the skill.

And that's why [00:09:00] writing is, is a really great middle ground or artifact of, of, of learning because it's also a gift to people. And it comes from also from like gratefulness, like feeling grateful to the great explainers that explain things to me in the past. So, When I would, you know, really struggle in the beginning to understand, you know, what neural networks are, and I come across a blog post by Andrej Karpathy or by Andrew Trask or Chris Ola that explains something visually in a beautiful way or in like 11 lines of code where you can feel that you have it.

Like I feel a lot of gratefulness that this made me closer to a goal that I have. Yeah. By, by simplified it's what we're trying to do with what we just released, like LLM University at, at Cohere and a lot of my learning journey is a sort of echoed by it. I'm, I'm, I'm happy to see sort of that work of education, learning, growing as a community you know, [00:10:00] teaching each other, but also collaborating on, on how we sort of learn.

And is, is it something that I think. Benefits everybody. And I advise everybody to write what they learn. A lot of people are stopped by, no, I'm just a newbie. I'm just learning this. But a lot of the times, just you listing the resources that you found useful is valuable on its own, let alone sort of how much it'll brand you that you sort of, wrote something.

And the writing process to me helps me learn so much deeper. Let's say the illustrated transform of the blog post. So there's what, maybe 20 visuals on there. The visuals you see on the blog post. Each one is, let's say, Version number six or seven like I iterate over it so much more and learn in the process.

So I would read the paper, I would say, okay, do I understand it correctly? I'll, this is my understanding of it. And then I would say, I would read another paragraph and say, no, the way I said it, this sort of contradicts with it. Let me draw it again with this new understanding. And [00:11:00] then once I know that I'm gonna publish it, I'm like, a part of my brain says, wait, other people are gonna read this.

How sure am I that this is how it actually is? Maybe I should actually go and read the code to verify my understanding here, like this depth of investigation. I would not have done it if I just read the paper and said, yeah, okay, I understand it. And then there's another thing of like, and it's a good life hack that I sort of, hand off to people.

So if you're explaining somebody's paper, you know the paper has their emails, and you are helping them sort of spread their ideas. So once you work on it, have it in a good shape. Write something, send it to them, have some feedback from them. That is also sort of another really great source of, of you know, feedback and connections that I've had over, over over the blog.

And it really helps remove some of the blind spots that you sort of, cannot see. This is especially more important and valuable for people who are, again, not in Silicon Valley. The majority of people are, you know, somewhere in the world without access to, [00:12:00] you know, the most people who are working much closer with, with this technology.

So a lot of us, all we have is the internet. So how can we sort of, learn with online communities? That's another thing that we sort of LLMU helps to like, you know, democratize that, that knowledge, but also, Cohere has a, a community driven research arm called Cohere for AI that aims to broaden Machine Learning knowledge and, and research and sort of accepting people from, I think, yeah, people from over a hundred countries right now.

So to me that, that is a little bit on it because coming into that I was like, you know, I'm, I'm a professional working in a specific profession. I'm really excited about this. I wanna learn this, but I'm, you know, I'm, I'm, I'm nowhere near any of these you know, very big companies that do it. How do I do it?

And so that's what I hope you know, the opportunities that people get from sharing, creating things, sharing what they learn and learning together as a community. And it's what we try to do also in, let's say the Coherent [00:13:00] discord. We're like, you know, right. Let us know. Let's, let's learn together as a community.

Yeah. It's definitely super valuable to share anything you want to share. And if you are wrong, we'll, In the worst case, I, I believe nobody will see it just because it's, it's, it's like not high quality, but if people see it, you will end up being corrected and we, you will just learn even more as well. As long as you are not like intentionally spreading misinformation, it's possible that you are not completely sure that you fully understand what you are trying to explain, but it, it can still be just like doubts in your head, in your, in your head and you are still right and you, you can share.

It's like, it, it's like a fear. You, you have to get over at some point. A hundred percent. That stops a lot of people. A lot of people are like, you know, I'm not the world's best expert on this and you don't have to be, and like you can write that right in. You're like, and to me that gave me a lot of license and a lot of.

[00:14:00] Comfort in writing the things where I say, I'm learning this, let's learn it together. These are my notes, this is how I understand it. And once I do that, I learn deeper and, you know, people correct it, I just update it. I just send, put an update or, you know, change the visual or, and that's a, that's a great way of, of learning together.

So you're doing your audience sort of a favor by learning together. So I, and it, it helps you career wise. So I advise a lot of people to say, okay, this, this will help open doors for your career and, you know, for possible jobs that you can have in the future by by showing that you're passionate about this one topic or later on if you do it long enough that you're an expert in it.

Absolutely. Yeah. Visibility is, is super important. Even when you are learning, like by being on YouTube for three years, I, I've seen a lot of people asking me like, what to learn, where to start and et cetera. And if, especially when, when learning online, a lot of people get stuck into like doing one course and another, another another, and they just.

Keep on trying to [00:15:00] learn because they just, just like myself and most PhD students, we always, we almost always have the impostor syndrome and we just, like, we think we are not the expert and it's, people should not trust us or believe in us, but it's just like, we need to get over that and just, and just try at least.

Yeah. Yeah. No, a hundred percent It's transparent. Yeah. But like, even if you go to the world's, you know, best experts on anything, the experts usually are, are the experts on one very narrow thing. And then they're just learning everything else. They're not just, yeah. So it's these are just our limitations as humans.

And right now with all the experience that you have teaching it like with your amazing blog, as I said, on Transformer attention and everything related to large English models, and now with. The LLM University where you also try to, as we spoke with Louis in the, the previous episode, you also do your best to [00:16:00] explain how they work and what you can do with them.

And I know that this usually require lots of visual, like it's visuals are very helpful to try to teach how complicated things work. But I wonder if after all this time working with this and trying to explain it, you could find a way to explain Transformers and attention relatively clearly now with just the audio format. Would you be able to, to explain how it works to, for, for example, someone just getting into the field?

Okay. Yeah. We, we have really good content on, on LLMU for that. And one thing that makes LLMU special is that I'm collaborating on it with incredible people. So Luis is, one of the best ML explainers and educators in the world.

And right now, if somebody wants to learn Transformers, I really, I really don't refer them to the illustrated Transformer, the title out. I refer them to Luis's because article on and on Transformers, because in the illustrated Transformer that there was a, [00:17:00] a context where I was expecting people to have read the previous article about attention and RNNs.

And if you're coming in right now, maybe you shouldn't, you should skip learning about RNNs and LSTMs. You can just come right into neural networks and then sort of, Transformers and attention. And so, part of what makes LLMU special for me is collaborating with Luis but also with Meor Amer, who's like one of the best visual explainers of things.

Meor has a visual intro to a book called Visual Intro to Deep Learning that has visual explanations of a lot of the concepts in, in, in, in Deep Learning. And you know, what are the best. People who can really take a concept and put a, put a visual picture on it. So that, that collaboration has been sort of a, a, a dream come true for me on, on LLMU.

So the question on, on how I've been explaining Transformers to very different audiences over the last five years, and there are different ideas. Yes. Depending on who, who the audience is. There are, so one way is [00:18:00] to say right now a lot of people are used to generative models, to generative Transformers.

And so that's a, that's a good way to, to see, okay. How does a tax generation model one of these G P T models, how does that answer a question if you ask it? And the way it does it is by generating one word at a time. And so that's how it, it runs on, on, on on inference. How does it generate more and word at a time?

We give it the inputs. The, let's say we say you know, what date is it today? You know, it breaks that down into, I'll say words. The real, the real word for it is tokens. But let's say it breaks it down into the, into the words, and it feeds it into the model. And the model. On the other side of the model comes the, the next word that the model sort of expects.

And then that is added back to the input. And then the model generates, and next, and then the next, and then the next word. This is how it works. This is how these text generation [00:19:00] models work. Now, if you give them input that doesn't answer everything, how are they able to do this? What happens under the, the hood that makes them do that?

That's another thing. But in the beginning, I like to give people a sense of, okay, when you're dealing with it at inference time, this is what it's doing. You can then go into the actual components. So how does it do that? Well, the input words are translated into numeric representations. Computers are computers, they compute, and language models are technically, I heard this from somebody called Sam Joseph.

Transformer language models are, are language calculators. So everything has to become numbers. And then those numbers through calculations and multiplications become other, other language. And so that's what happened inside this box, which is, which is, which is the model which was trained.

We'll get to how training happens at the end, but now just assume we have this magically trained model. You give words, it predicts the next word and it gives you something Coherent [00:20:00] based on the statistics of, of the text It tools trained on. Mechanically how it works is that, The input text goes through the various layers of the model, and the model has, let's say, components or blocks.

This can be six layers or in the original Transformer, but like y you know, some of the large models now are 90 or you know, a hundred layers. And each layer processes text a little bit outputs, numerical representations that are a little bit more processed. And that goes to the next layer, to the next layer.

And then by the end you get enough processing that the model is confident that, okay, the next word is, is this way. So this is another layer of, let's say, breaking it down. And from here, yeah, we can take it in in two different ways. You can say how it was trained. And then we can also break down these blocks and these layers and talk about their, their various components. So I will have you choose your destiny and sort of steer me, which, [00:21:00] which way would you like us to go now?

I think I, I'd rather go for the how the blocks are, are made, what the blocks are made of and how it works.

Amazing. So I give an example of there are two major capabilities that are, that correspond to the two major components of what's called a Transformer block.

Have you seen the film, the Shawshank Redemption? I haven't. It's a, it's a very popular film but like, it's just these two words that are commonly used together, Shawshank and redemption. So if you tell a model Shawshank, it will just based on a lot of the data that it was trained on. Then there aren't a lot of words that appear in the, the training dataset that usually come after Shawshank.

So the highest probability word would be redemption. So just based on what the model has seen in the past. Yeah. And so that is the job of one of the two components. That's what's called a feed forward neural network. That's one [00:22:00] of the two major components of this Transformer block that just works on, on these, let's say, statistics of so if you only have that component of the Transformer block, then you, the model can make this prediction if you give it an input.

Text of saying Shawshank, it'll output redemption. That's, that kind of work. But then language is a little bit complex. And that is not the only mechanism that can make software generate. You need another mechanism, which is called attention and attention. We can think about it as saying, okay, what if we tell the model this sentence and have, try to have it complete?

The chicken did not cross the road because it, no, does it refer to the road or to the chicken? It's very difficult to say, okay, to rely on the words that usually traditionally statistically appear after the word dates, because that will be a meaningless sentence. In a lot of cases. You need to make the understanding of.

Are we talking about the streets [00:23:00] or are we talking about, about the chicken, and that's the goal or the purpose of the second layer, the attention mechanism. How does it do that? So it's built in a specific way that it learns this from a lot of the data that it trained on, which we can go into.

But these are the two major components of a model is multiple TRA a transform model, including a GPT model. The T and GPT is, is Transformer is multiple Transformer blocks. Each Transformer block is, is self attention, the attention layer and then feed forward neural networking. Each one of them has this goal.

And then once you stack them for a model that is large enough and train it on, Large enough dataset, you can start to have these models that generate code, that can summarize, that can write copywriting and, you know, you can build these new industries of AI writing assistance on top of them.

Yeah, that perfectly makes sense. I, it's a really good explanation and I've struggled for a while. [00:24:00] Even like in, I think it's, yeah, it's two years ago that, no, three years ago that GPT three came out, and it was, I don't know why, but I think it's, it's always the case for new technologies, but it was really hard to understand well enough to explain it properly and well now you definitely mastered it and I love how to, how you separated the different topics and not dive into the details too much.

I often get stuck into the details because I, I like, like how attention calculates. The, the, well, the attention for each word and cetera. I, and I, I really like the details that you didn't even mention, and I think it's relatively important to not mention them. Mention them as you, as you, as you've done.

And yeah, I, I, I still need to, to learn how to best explain things, but it's, yeah, it's, it is really nice to, to see you explain something that I even know now, but it's still like, [00:25:00] teaches me new stuff. It, it's really cool.

It helps to do a lot of it iteration and just do it over and over again and explain it to people and then notice that I said this and then their eyes sort of, yeah start to defocus a little bit and sort of going back to say, okay, maybe this was a little bit too, too much detail. Let me delay it. You can still mention the details, but I love to layer it of like, say, You get one part of the concept and then you go a little bit deeper into another part, and then Yeah.

But you get the full concept first at the high level, and then a little bit of a more resolution on another, another part that's a little bit of a philosophy that I've, I've, I've seen work over the years.

And I think just for a, a regular presentation, it's also a good, a good format to follow that just even to mention it, that like, that's the broad overview.

I will dive into the details later, but for now, just focus on that. Like, I think just mentioning this made it like more interesting. Like you, like you are a [00:26:00] bit lust, but you know that it's it's gonna come and so like you don't feel lust. It's, yeah. I think it's, it's a, a better way of, of explaining for sure.

Even if, for example, anyone listening that are not teachers or do not have blogs but still need, for example, they're working and they need to present something or. Any kind of presentation or just sharing knowledge is, is just really relevant to, to learn or improve how you share it. Now everyone talks about ChatGPT and so I, I would love if you could go over the different steps from the self supervised part to the fine tuning to the reinforcement learning with human feedback feedback. Like how would you explain all those quite complicated steps in simple words?

Yeah. So I do intend to at some point write something about human preference training, either without or with, without it, there are different sort of methods. So [00:27:00] today works. One of the things that makes these models work now is that we can have a lot of data.

That is unlabeled and train the model on. So we can just get text, get free texts from the internet, from Wikipedia, for example, or books or from any dataset. And we can use that to create training examples in this unsupervised, which is now called semi-supervised way of saying, okay, let's take the one page from Wikipedia, maybe the page about the film, the matrix, for example.

Or, or, or Shawshank just any, any article and say, okay, that page has 10,000 words. Let's create a few training examples. Let's take the first three words and first present them to the model and have the model try to predict the fourth words. That's a training example. And then we can then again have another example where we give it the first four words and have it try to predict the fifth word.

And so we can, you can see that we can just slide this window and create millions [00:28:00] or billions of, of, of training examples. And that's what happens in the beginning. This is why they're called language models. This is a, a task in NLP called language modeling. Now that turned out to be one of the most magical things, one of the biggest returns of investments that maybe the technology ecosystem or like human technology has ever given us back.

That with this you can go so far. And in ways that sort of are really surprising to the people who are working closely with, with, with this technology, that if you do this with large enough models on large enough data, the model then will be able to retain information, world information. So you can ask it about people and it will tell you, you know, who acted in the matrix and what date and what time.

And that information starts being, being there. It starts to generate very Coherent text That sounds correct and is grammatically correct. And how does it do that without us being writing all the [00:29:00] grammar rules in it. If you train large enough on multilingual dataset, it starts being able to do that in.

All languages in multiple languages. So the language modeling is, is one of the magical things that sort of are really bringing this, this massive sort of explosion in, in capability of software and, and ai. And it's the source of where all of this start, and it's the first step in training these, these Large Language Models.

And it's the one that takes the most compute and the most data. So this can take, you know, months and months to train, take a model. In Machine Learning, you take a model and then you can start with a model with, I know with any number of parameters, but they're random. The beginning and the predictions that the model makes are jerk because they're, they're random, but it learns from each training step.

When we give it the first, when we give it the three words and have it predict the four word, its prediction is gonna be wrong. We'll say, no, you said this, this is the correct answer. Let's update you. So the next time you [00:30:00] see this, you have a little bit of, a bit of a better chance of, of doing it. Again, this step is what happens billions of, or, you know, millions or billions of times.

This is the training, this is the learning in Machine Learning of making a prediction, updating the model based on how wrong that prediction was. Yeah. In doing it over and over again. So that is the first and most expensive step in creating a base baseline model. Once that, that came out and people started using it, you can make it do useful things, but you have to do a lot of prompt engineering to have the model because you can ask the model a question and say how do apples taste and the model based on just what it's seen in, in the dataset it can ask another question and say, how do oranges taste? And how do strawberries taste? These are all reasonable continuations because you gave it a question, gave you more questions, but maybe changing the fruit type. Mm. But [00:31:00] the people from their interactions actually wanted was, if I ask you a question, give me an answer.

If I give you a command and tell you to write an article about apples I want you to write an article not to give me more commands about this. And so this is what's called preference training. And to do that, you get these training examples of a question and this answer and, or a command and where you say, okay, write me an article about X, and then you have the article about X, and then you train the model on this dataset.

And that's how you get those, those. That behavior of the model that, that follows what people started expecting from the model of, you follow my, and so that's what, what commands are, that's what Coherence command model is attuned to do. And that's what in Struck Gpt sort of started doing and how it improved on, on, on GPT three in the past.

So, so that sort of, next step and then you can do, you can get a little bit more you can align the model [00:32:00] better to those behaviors by having another sort of training step, which sometimes can include reinforcement learning. By not just doing language modeling on this new data set that you've, you've created and, and provided, but also giving it good examples and bad examples and to say, okay, make it closer to good examples and, and farther from, from bad examples as rated by another, let's say reward model.

But that complexity, I think a lot of people don't need to get into it. Like as long as you understand the language modeling objective, and then this, the idea of, of preference that gets you most of the understanding that you need. Then just you focus on how can it be relevant and useful for your own product that you're trying to build.

What kinds of prompts, what kinds of pipelines or chains that are useful for that. And that's, you know, for the vast majority of people, much better than sort of understanding the, the bellman equations or like, you know, the detailed reinforcement level steps.

Regarding the different products that you build with [00:33:00] those models. I, I know that you talked a lot about that in the LLMU and one thing that is, I believe, super important and promising other than, for example, fine tuning and the, the common models is to use embeddings and build applications on them, like retrieval related applications or any other kind of semantic search classification, et cetera.

And I wonder first well, I have two questions with that. And the, the first one is, what are embeddings and what can you do with them? But also what to you is, is more promising behind trying to make the perfect ChatGPT model with lots of fine tuning and, and, and like the best comments possible and, and human feedback and everything to make it perfect Or like use a, a model just for embeddings and then to, to work with very specific applications like those, or they are [00:34:00] just very different use cases and both are relevant.

Yeah, so there are gonna be people who use both. There are gonna be people who are just gonna be, you know, using different prompts and sending them to a large model and getting the results back.

And you see a lot of these on LinkedIn. You know, these are the top 10 prompts to use. There's a class of people that will find that useful. But there's another class which I sort of advocate for, is how to think of these tools as components that you can build more and more advanced and, and say systems where you're not just consuming this one, one service or one model, but you're actually building them as a builder yourself.

Yeah. So I advocate for that and for, for you to do that. The idea of embeddings is one of the most powerful ones and one of the most really central ideas that just like. How API is a, is a word, is not only a, a technical term now it is a business term. CEOs have to know what an API is, you know, the last 10, 15 years.

Embeddings, I believe, is [00:35:00] gonna be one of those things because it's one of these central components of how you can deal with, with Large Language Models and build more and more systems. Embeddings, in short, are these numerical representations of, of text. They can be of words. So things like word tove were methods to give each word a series of numbers that represent it and capture its meaning.

But then four words. We can also go into text embedding, which is you have a, a list of numbers that represent an entire text or sentence or email or, or, or, or book, so to speak. And so that concept is, is very important. If you elect to be a builder with LLMs. And you start to sort of generate, get a sense of what, what embeddings are.

One of the best things that I advise people to build is something involving semantics search, where you get a data set. So maybe let's say the matrix Film Wikipedia page. Break it down into sentences, embed each sentence, and then you can [00:36:00] create, let's say a simple search engine on, on this data set.

And the search engine works like this. You give it a query. Let's say, you know, when was the matrix filmed, for example, or when was it released? That text is also embedded. So you send that to the, to an embedding model, kind of like Cohere embed endpoint. You get the numbers back, and then you can do a simple nearest neighbor search.

That's also a very simple, like two lines of code. You can, you can get this nearest neighbor and then that will give you the top three or top five sentences that are most close to that to that query. The beautiful thing here is that you can, regardless of the words that you use, the LLMs, capture the meaning.

So even if you don't use the same words, the, the model captures the intent. That's why when these models were rolled out, like especially the BERT model in 2019, like six months later, Google rolled it out into Google search and called it one of the biggest leaps forward in the history of search.

Just that addition of that [00:37:00] one, one model, and most likely, So sematic searches like has these two capabilities that it, you can build it. What we just compared is it's called dense retrieval, which is you have, you embed your archive, you embed your query, and you get the nearest neighbors. So that's one major concept that I advise people to build with.

And the other one is called re-rank. And rewriting is just using an an LLM to change the order of search results that happened in a step before. So you throw your search at your existing search engine you get the top 10 results and you throw those at the reranker. That sort of changes the order and that dramatically improves, like if you have an existing search system, this dramatically improves the quality of those of those search results.

And so these two components each has, let's say, their own endpoint and, and super high quality models on, on, on the Coer side are. Maybe the two best ways to start dealing with Large Language Models, because then that is the future of generation as well. Because [00:38:00] retrieval augmented generation is absolutely one of the most exciting areas.

And you know, one of the areas that can help you rely on information that you can serve to the model when you need it, you can update that data whenever you need it. You can give different users access to different data sets. You're not reliant on data stored in the model. You want to update them, okay, let's train the model for another nine months.

And then, you know, the model can sort of also, that increases the model's hallucination. So yeah, there's a lot of excitement in this area that brings together semantic search and, and, and generation. And we think it's, it's, it's, it's highly warranted to, to, to pay attention to it. Yeah.

Retrieval is definitely, as you mentioned, a great way to. Not avoid, but limit the hallucination problem because you can almost, it, it doesn't work all the time, but you can try to force it to only answer with like the, the response and give reference to, [00:39:00] to what it, it responds. So like when, when it's search in its memory and just finds nearest neighbor, you can ask it to only answer with what it find and also give the source of what it found.

So that's really powerful compared to ChatGPT that will just give you text and hopefully it's true and you don't even know where it comes from. So it's definitely safer. And also, as you said, easier to build. You don't require to retrain the whole model and everything and can build multiple applications super easily. But I, I'm not that familiar with their re-rank system. Could you give a bit more details on. How it works and how it actually reorders the answers and improves the results.

Sure. Yeah. So, so Re-rankers are these models that, let's say you are, let's say Google and you're rolling out your Google search Re-rankers, you have your existing system before Transformers.

You give it a query, it gives you a hundred results. Yeah. The easiest way to deploy a, to power your search with LLMs, with large library models [00:40:00] is to say, okay, these hundred results, let me take the query and take each one of these results and present them to the model and have the model evaluate how relevant the, this result is to this text.

Okay. So the reran is basically a classifier that classifies, it has two parts of text. It's what's called a crossing code. So you can, you give it examples of a query and it's answer, and it should give the result of one, let's say, because that's a. That's a true, but then a query and a document that is not relevant to it, and you, the training label there is, is zero.

So this is not relevant to this. So that's how you train it. And once you train it you just plug it into an existing search system. The, the previous step can, can use embeddings or cannot. That's, that's, that's fine. But then it gives a, a relevance score for each of the hundred results. And then you just sort by that relevance, that becomes just one signal for your, for your search that you can either [00:41:00] just use and sort by the most relevant or then you can use other signals.

If you're rolling out actual search systems, you want other signals. You want, let's say I want the more recent ones. So assign a signal for recent documents, or if you're building search for Google Maps, you are like, okay, give me things that are closer to this, to this one point. So that's another sort of search signal.

Or you can inject a say preference or things. So, That's how Re-rankers work. And they, you know, you can source by source, by relevance directly, or you can just use that as one additional signal to a more, more complex Re-rankers.

Okay. Much more clear now. The easiest way, basically to use Large Language Models when you already have a source system or when you have a data set mostly, for example, anyone in a company or a manager or someone that has an issue or a problem or just in its regular work, how do they know that their problem can be helped with LLMs?

Is there [00:42:00] any tricks or, or tips to know that like, oh, now I should use l an LLM or something embedding or like a, a product of Coherent, like how can you know? That this problem will be helped through L an LLM.

Yeah. Yeah. That's, that's a great point. And the, the common wisdom is that, you know, use the best tool for the job. LLMs are great for some use cases. They're not good for everything. There are a lot of use cases where somebody wants to use an LLM and I would advise them, no, you should use a regular expression for this, or you should use spacey for this use case, or you should use just Python string and, and text matching.

The LLMs are just one additional tool that makes adds a level of capability to your, to your system that should augment existing things that, that sort of work with them. So that understanding is a little bit important. Some people will be driven by the highend, you know, would want to inject AI [00:43:00] somehow.

They told their investors, we will roll out AI in our next product release. How to do that. Let's find any way to do it now. It really should come from, let's say, user pain and what problem you're trying to solve. And you can classify two major parts parts there. So one is maybe you are improving a specific tax processing problem right now, and you can get better results if you, if you try an LLM.

And so you have to choose what metric that you have that will improve your product or solve your pain, and then compare LLMs with existing strong baselines. Which, you know, there are a lot of things that. That can be done with, with, with things that are not. But then once you see that the LLM is, is providing that, that, that value for you, that's when you sort of, progress forward.

LLM providers like Cohere make it easy in terms of that you don't need to worry about deploying a model. Models are going out of memory because this model needs to fit on tens of GPUs or something. Just worry about, okay, you want to do re-rank. Okay, send me the [00:44:00] query, send me the 10 texts. I will send you back the ordered list and I will make that better The next time you send me a request because I'm updating the, the model or training it on new and better data every, every month or with, with every version.

And so, this is a new type of, let's say, providers of of, of, of this technology. But yeah, definitely focus on the problem that models can. So improving existing text processing is one that's search system classification, but there's this new capability of text generation. So AI writing systems were not possible before three years.

So these are these new categories of, of, so you might be wanting to innovate. I will create, I don't know, next AI dungeon interactive AI games or the next media format. Or I want to create a world like GTA with all of its radio stations, but all of it will be generated by the computers. And I'm gonna be creating something new.

And they, that's the second category of, of things, [00:45:00] experiment. It allows for a lot of new applications to be born just because before you, well, not a long time ago, but when AI first started, you need to train your own model. And as you, as you mentioned, hosted. On the cloud or anywhere, and like, there's a lot of, a lot to manage.

But now thanks to open AI Cohere and other companies, you can basically have someone else do that for you. But it's still, there's, there are still some challenges in building those Large Language Model based apps. For example, if like, I have a specific data set in my, in my company for, for instance, if it's a, a very private company and the data set cannot go outside of the intranet of the company, what can you do if that sense Cohere and open ai, for example, it's all outside the, the internet. So what can you do if you want to build some kind [00:46:00] of. Search based chat bot.

Yeah, that's a great question. And that's a very common concern. And we come across like it's one of the biggest areas that companies in the industry, but also specifically enterprise, like large. Yeah. Companies. Companies working in regulated spaces and Coherent actually caters to that.

So there is a solution of bringing the models to your own sort of virtual private cloud. So there's this rollout with AWS SageMaker where the model can be deployed in your own cloud. The data does not necessarily go to Cohere's infrastructure. They remain on its own sort of data center, but then it's run through the SageMaker endpoint.

And all of it sort of, you know, remains. And that's one of the. Use cases where we see a lot of, a lot of demand. Cohere focus on enterprise makes it able to focus on, on, on use cases like this. Where it's like, say not specifically, you know, consumer focused, but it's like, you know, what are the big business problems of, you know, building the next generation of intelligent applications? And this, I like that you highlighted it because this [00:47:00] is, you know, commonly asked for. And you know, we'd love to see more people sort building with those sagemaker.

It's great to know that it's possible. And, and for the, the people that maybe have different problems, not large companies, for example, if I know someone is learning and wants to build an app, what are the main challenges when it comes to, to building such like Cohere or open AI based apps where you basically use the very powerful models that.

Already exist, but want to fine tune it or like to adapt it to your specific applications, either through a data set or, or just specific comments. But what, what are the typical challenges or things that, that the people that want to create cool things with, those need to, to tackle and, and just go over?

Yeah. So the challenges are don't all have to be [00:48:00] technical challenges. Like everything in the past, like remains true in terms of you still need to find product market fit for your product. You need validation from your users. You need to really solve a problem and not do something that is nice, nice to have with the generative models specifically like.

Identifying reliable use cases is, is one thing that a lot of people need some handholding on. Like they come across an amazing demo on Twitter or something, but then they don't realize that a lot of the demos are cherrypicked. It's like, yeah, they had to generate, you know, 20 generations to get that one.

Let's say if you're building it as a product, it cannot work just three out of 10 times. It needs to work nine out of 10 times. And so how do you get it to that level of, of of, of production that that gap between a proof of concept product of this prompt can work? I will take a screenshot of it and I will put it on, on Twitter to, this is a reliable system behavior that I know [00:49:00] I could put in front of my users and it'll work.

Always bridging that gap is, is, is, is one of the. The challenging things that a lot of people have, have to have to contend with. And there are solutions and there are playbooks and we sort of write a lot about them and educate about, and they include things like, yeah, using search, using embeddings.

Fine tuning is another one. You can, the big models allow you to prototype and they do amazing behaviors. And once you have the model able to do the behavior that you want, using one example in the prompt or you know, five examples in the prompt, you can then make it cheaper for you and faster by collecting that data set and fine tuning a smaller model that is able to do that same task as, the larger model.

That saves on, you know, context size because you're not sending the same, you know, five few shot examples with every, every prompt. And so that, yeah, that is helpful. And then, Another part of that is [00:50:00] like we also talked about Yeah. With sematic search and getting the relevant bits and injecting them into the prompt, you know, people think that some people may think that Context link will solve all the problems.

If you have a very large context, link will solve. And so you will send your, I don't know, the documentation of your software to the language model with every question that you ask about that documentary. And you can clearly see that that is wasteful if you were to answer a thousand questions. The model has to process the same documentation thousands and thousands of times while embedding is really, is this way of caching that, that knowledge and retrieving the the, the important bits.

So yeah, these are a couple of things that experimenting and thinking about reliable behavior is, is one of the learning curves that a lot of people have to go through.

What is skills and material needed to do that if. First, can only one person do that? If I want to create an [00:51:00] app, do I need a whole team and do I need a server?

Do I need to, to go through a course beforehand? Like what, what is the, the required skillset and material to get into that? Is it just impossible for one person or can the person listening right now that has an idea can just start and, and, and learn in the process? What's, what's accessible and how accessible is it?

So in, in, in software like in general, you need a user interface, right? So if you are targeting users, Howard, will they interact with it? Or are you creating, let's say, an API endpoint that other people can just connect to it? So, There's a bunch of software hurdles that are not necessarily language modeling or prompt engineering or so that, let's say piping of that information and how you, your users sort of, connect with it.

So if you know Python or, and, and the JavaScript one person can go very, very far if they invest in sort of these two things. If you only know Python and [00:52:00] Machine Learning or data science, you can create a proof of concept. You can use something like Streamlet and, and, and sort of create an application that they, user interface that you can maybe demo to investors to, you know, help you build the next sort of level of it and more and more you see, yeah, companies like Vercel coming along and making that front end to ai pathway sort of a, a little bit easier.

So, The language models will continue to make it easier for generalists to do many things very well. We're still in the beginning of that, but it's clear that you, individual people who are productive will become massively more productive, aided by these technologies and what they can do. So yes, smaller groups of people will be able to do a lot more, but for now, yeah, there are these skill sets of how are you gonna build the, the ui, how are you then gonna put it in front of users?

You can roll it out on the app store, you can [00:53:00] do it through some, some, some marketplace. Can you do that individually? Do you know that customer segment? Do you know the really know the pain that you can solve with them? But yeah, I mean, a lot of people run one or two person companies where it's like, okay, charge credited cards, use these various frameworks and then put some, some good looking UI on top of it.

Indeed, you talked about how generalists can now do more things to AI and will only increase, and that's really cool because I believe I am some somewhat of a generalist. I really like to know about everything and I'm, even though I'm doing a PhD, I'm, I'm not completely sure about like being super specialized to one thing and forget the others.

Like I really like to learn everything, and that's a very recurrent topic in, in the podcast. We, it, it's funny how basically years ago, a lot of people said that AI will. [00:54:00] Increase the discrepancy between rich and poor and will just make things even more unfair than what they were. And now I believe we, I'm not sure if I have all the data and information, but I believe we see almost the opposite, where in my case at least, and in lots of people I know AI actually allows people to do things that they couldn't do before, which is quite cool.

It's pretty much the opposite. And it, it just democratizes lots of stuff like, for example, building applications. One of my, my friend is, is currently doing some kind of challenge that she, she is trying to learn, like to use ChatGPT and she does something posting daily for 30 days about ChatGPT and what she does.

And she's like in human resources and she doesn't know any programming. But she still coded an application that is a to-do list [00:55:00] and everything thanks to ChatGPT without any like, python, JavaScript, any notion of coding and I don't know. That's incredible. It's, it's, it's so cool.

[00:55:30] A hundred percent. And like I, you know, predict that we'll start to see not one, but many five people, companies reach a, you know, a billion dollar in valuation pretty soon just aided and augmented by, by AI. Definitely a lot of opportunities created, but also there are a lot of, of, of challenges and that, you know, we need to be sort of cautious about.

There's opportunities for misuse. As well as like the need for people to keep learning, keep developing their skillsets you know, use these new technologies in their own workflows to augment them and make their. You know what they do better and better. You can't just rely on what you learned in college when you graduate five years ago.

The world just keeps changing very quickly. And so the more you, you are quick at learning and adapting and incorporating these tools and your and, and what you do, the more of the opportunity you will catch and the, and the, and then you'll resist the [00:56:30] challenges.

And speaking of these challenges, one last challenge that. I often struggle with using ChatGPT or other models is hallucination. And is there any other way than, for example, using memory retrieval or if that's the only way to solve it, but is there any other way to help improve hallucination or just in general make those applications safer and like, As open OpenAI say, says, more aligned with what you actually want.

Yeah. There there's two sides of these questions. Like obviously you can do that during training as as OpenAI does. But what if you are using an OpenAI product or a Cohere product and you want to make it safer on your end? Is there anything you can do to help mitigate the model hallucination? Even if you are you, you do not control the training process.

So we already mentioned one of the big ones, which is like, you know, actually injecting the [00:57:30] correct information so you're not relying on the model's Parametric Data. That's, that's one. There are methods that you as an engineer can, can build systems around. A lot of them were outlined in Google's Minerva model paper that just, you know, solves a lot of very complex mathematics using so where we heard about things like chain of thought, where the model, you know, you ask it a complex question, it shouldn't answer it right away, it should output the steps of how it can sort of ar arrive at that. Then there are things called, there's another method called majority voting, where the model is supposed to output not just one result, but maybe 10 results. And then those 10 results you, you know, Choose which ones have occurred more than one time and use those as votes.

That's specifically if you have, let's say, one specific output that you can have at the end. And so that's, that's another sort of way, there's a paper called it around it. And the methods called majority Voting. Close to this is this idea. [00:58:30] This also a recent idea of, of three of thought where it's like chain of thought, but like multiple chains of, of, of thought.

And then you can sort of reduce the, so that's, that's, let's say one way of evaluating if the model generates the answer four or five or 10 times. If it says the same thing over and over again, most. There's probably a good chance that it's, it knows this, but if there's variance in what it generates from across these five or 10 times, that is probably an indication that the model is just being creative. And then there are things like temperature and let's say setting the right temperature, setting that sort of arguments to, to, to a certain degree.

Yeah. It's a, it's an easy way to, to at least mitigate like very random answers. But still, it, it also increases the cost since you have to generate multiple times. But yeah, it's a, it's a very easy way to do that. One last question that I have very specifically towards Large Language Model [00:59:30] is, the first one is how are the, for example ChatGPT that work with many languages, how, how they are built and trained on many languages, because I know. There's definitely a difference between Chad PT that works with almost all, every language and a model. For example, Facebook released a model that was trained specifically on French, and so it's, it's definitely not the same thing. So what, what are the differences with ChatGPT? Why, how does it work with any language that you, that you can type in?

So multilingual models are just, you know, done by incorporating other languages in the training dataset and in. Optimizing the model or let's say initializing the, the tokenizers, which is like a step that comes before the, you know, training the model to, you know, choose how to break down the words. But it, it's really just a factor of the same [01:00:30] training process. It is language modeling. Predict the next word, except in our dataset we have a lot of, of other languages.

And that, that we also use to evaluate the, the models. How correct is the model on this? This language and that language. So you use that in your incorporation because. If you are serving these models, so if you're serving a model like like Cohere command, the one that we put out is not the only model that we've trained.

No. You have to train tens or hundreds of models with a lot of different experiments. Yeah. To really find the best performing model and do a lot of complex evaluations. So if you. If multilingual is one of your focus areas, which for us it is, it is a, you know, very much a focus area. There's a lot of this incorporating it in the training data, but also in the evaluations.

We have a lot of focus on multilingual, on the embedding site. You, we have this embedding model that supports over a hundred languages with, which is like, Completely sort of geared in, focused on, on, on search [01:01:30] in, in the multilingual setting. So you have to pay extra attention when you're building the model to incorporate languages because it was very easy in the beginning to just focus on English and, you know, not consider that the vast majority of people you know, also speak other languages. And they need them in car day-to day, their businesses and their usage.

So it's mainly just trained with even more data. I believe from research, maybe you can confirm, but just like for humans training on different languages actually improve the results in English as well.

Yes, yes. There are things that are strange like that, kind of like, you know, training on code also enhances generating text to, with like, you know, common sense or like reasoning capabilities. So yeah, the more you, you throw in there of like high quality data, that always seems to improve the results.

Yeah, It's just like humans when I don't remember exactly what it does to your brain, but learning a a music instrument actually [01:02:30] helps you understand other stuff better and just like makes you not more intelligent, but like it definitely helps you.

It's not, it's not irrelevant to learn art or to learn a music instrument and just with also the different languages that you basically are a different person in when you speak another language, that's also super interesting. I wonder if that's the case for Large Language Models where they act differently in different languages.

But yeah, I mean the distributions would be different in, in the different languages. Yeah. And I love how also this extends to multi-modal models of like once you throw audio in there, once you throw images in there, sort of how that also could then improve plywood generation. Yeah. Really exciting.

How do you see AI evolving over the next, I don't know, it's a classic question the next five years, but how do you see in the next few years, AI and Large Language Model evolve? Where do you [01:03:30] see we are going, like mostly trying to come back to specific applications with retrieval systems or build a niver better general model with less hallucination or anything else. What's, where are we going?

Yeah, in generation and like, embedding approaches I feel are here to stay. Things like, yeah, semantic search or retrieval are, some of these are things that you can't really even do with, with, with generative models.

There will be a lot of development in the models themselves, so, In the quality of data that can be presented to them and in the quantity and the types of data. So now we are at internet scale text data. Where do you now go next? We talked about multimodality. That's another area where the models were improved by being able to look at images or even generate images by looking at audio or getting other sort of modalities.

Beyond this, there's this idea of embodiment. So where the models [01:04:30] can interact with environments and learn from those interactions. That will also be another sort of source of, of information and, and feedback to improve model behavior. And then there's this idea of the social interaction. How models can, you know, socially interact with large groups of people, not just one person who gives it a prompt and gets a result back.

The social interactions, these are. Three of five wor world scopes that this one paper I've discussed on my YouTube channel sort of, displays this, this future of where are we gonna get more data now that we've, you know, trained on, on internet data. So that's on the modeling front and how to make the models better.

Definitely. New architectures, improvements hardware that will all sort of continue to develop, even though right now there's a little bit of a convergence, there haven't been any. Major steps on the modeling site for, for a while, but there's still a lot to be done in engineering. So rolling these models out, building systems around the capabilities we currently have.

There's so [01:05:30] much that that can be done there that will, you know, keep so many people busy for the next two, three years, but also inventing other ways of using other media formats that are now possible if you can generate images or generate texts or full stories or full podcasts. So yeah, the world will be a little different and a lot of people are gonna be very creative in terms of what they, what they deploy. A lot of it's gonna come from the engineering side, but not only from engineering modeling will follow as well.

On your end, is there one thing that right now AI cannot do and you, you would love it to be able to do? Yes. Is there one thing that comes to your mind? Yes.

One thing that I'm, it's a little random, but one thing I'm really obsessed about, just the nature of learning about intelligence in, in software and having software do solve problems in intelligent ways, makes me very, Intrigued about other natural intelligences beyond humans.

So animal intelligence the [01:06:30] dolphins the octopus, the, the ant colony, the apes. There are efforts like, there's this project called the CETI Project. CETI for, you know, trying to throw all the NLP technology that we have on trying to decode the language and vo vocalizations of, of Wales to see, you know, can we start to understand, maybe communicate with these other, you know, forms of, of, of intelligent life around us that we sort of don't have yet ways of communicating to them.

Absolutely passionate about, you know, this language, being able to allow us to, to connect better to our more intelligent forms of, of life around us.

Yeah, it's so cool. I've always been drawn into how we understand things and just also how a cat sees the world and just all the animals and, and living beings.

It's, it's really cool that. Like it, [01:07:30] it's like neuroscience and all this is a completely, well, not completely, but it's a different field. And now lots of people come from pure software and they become interested in that in, in, in neuroscience and these topics, just thanks to language models and how it makes you think of how things understand, it's, it, it's really cool.

And I'm excited to see where AI and like my field, where our field can, can help. The human race to understand other, other things. It's, it's, it's really cool. I have one last question for you, just because I'm, it's a topic that I'm particularly interested in. It's on your end since you are a blogger and now any, even a YouTuber.

First, are you using any AI help when creating educational content? Well, not necessarily l l m, but. Maybe AI editing or just generation or asking questions, [01:08:30] brainstorming. Are you using any AI powered tools to make your writing process better or just creating creative process better.

Sporadically? Not on a, let's say a daily basis, but yes, sometimes for like outlines or idea generation very useful.

Or like some artwork for thumbnails sometimes like, Mid journey has been, has been useful for, for, for some of these. But like everybody, I'm just learning how to adapt them into, into my workflows, toward the investment.

I didn't see myself use it until very recently, and now I've, I've, I've seen a particular, a particular use case for me just when I'm. It's mainly because I'm French, but, and not a native English speaker. But it, it's really helpful to help improve your formulation and syntax. That's one thing, just because it helps me improve. But another thing is when you, I'm still currently learning lots of new stuff and I still try to explain them while learning and why I see a [01:09:30] word that I don't understand or a topic that is, that seems a bit blurry even if I have the paper and I, I think I understand them asking GPT or any other model is quite useful. It actually like Reformulates and it, it can be very useful to quickly get high level understanding of specific topics that, that's something I've been using recently and it took me a while to get into that, which is weird because we are actually explaining how they work, but we don't use it nearly as much.

But it's. Now, now I see a better use case. Like the more, the more time that we have with them, the better we, we use them, obviously. And that's, yeah. I see. It's really promising. It's, it's really cool. But yeah, you definitely have to double check the outputs and ensure it's not hallucinating or anything else.

It's, it's, it's still requires human inputs, but it's really useful. [01:10:30] And so just as a, it's not really a question, but the last thing I wanted to mention, is there anything you would like to share with the audience? Are you, do you have any project projects other than the Large Language Model university that Of course anyone can go right now through Cohere for free, learn a lot about transformers and everything we discussed in this interview.

And it's a really good resources, resource. I, I definitely. recommend it, but is there anything else on your end that you are excited to share about or to release soon or work on? Yeah.

Yeah. So aside from the LLM University, we have the Cohere discord where we answer questions. So if you have questions as you go through the LLM University, join us.

Let us know what you want sort of, to, to learn about. We're happy to sort of help you with your learning education, and then when you build something. We're also welcome that you share it and say, you know, what problems you faced, how you solve them. So it's a big community to learn together and, you know, we welcome everybody on the [01:11:30] Cohere discord.

That's awesome. Is there anything coming up on your YouTube or, or, or the blog.

So I've been doing a bunch of shorts. I've been digging deeper into these tools that build on top of LLMs, like line chain, like LAMA index. So I've been doing a few of these, of these shorts. So that's a little bit of my focus area now.

But in terms of topics, if I can carve some time to talk about human feedback and RLHF that's high on my list.

Yeah, I'd love to do, I'd love to see that. So, perfect. Well, thank you very much for, All the time you, you gave us and the, the amazing insights. It was a really cool discussion to have with you. I, I, I've known you for only two years, unfortunately.

I didn't know your blog before that, but it's just amazing resources and likewise for the LLMU I'm, yeah, I'm really thankful for, for you to, you and, and your team for building that, but also to you personally for the YouTube and the blog. It's just really cool that people like you exist. So [01:12:30] thank you and, and thank you for joining the podcast.

Thank you so much. That's so kind of you you know, I'm just a student like any other, and we're just learning together. Thank you so much for having me. And looking forward to yeah, interacting and, and speaking together in the future.