Watch the video!
We’ve all heard about GPT-3 and have somewhat of a clear idea of its capabilities. You’ve most certainly seen some applications born strictly due to this model, some of which I covered in a previous video about the model. GPT-3 is a model developed by OpenAI that you can access through a paid API but have no access to the model itself.
What makes GPT-3 so strong is both its architecture and size. It has 175 billion parameters. Twice the amount of neurons we have in our brains!
This immense network was pretty much trained on the whole internet to understand how we write, exchange, and understand text. This week, Meta has taken a big step forward for the community. They just released a model that is just as powerful, if not more and has completely open-sourced it. How cool is that?
We can now have access to a GPT-like model and play with it directly without going through an API and limited access. Meta’s most recent model OPT, which stands for Open Pre-trained Transformers, is available in multiple sizes with pre-trained weights to play with or do any research work, one of which is comparable to GPT-3 and has the best results. That is super cool news for the field and especially for us academic researchers.
OPT, or more precisely OPT-175B, is very similar to GPT-3, so I’d strongly recommend watching my video to better understand how large language models work. GPT-3 and OPT cannot only summarize your emails or write a quick essay based on a subject. It can also solve basic math problems, answer questions, and more.
The main difference with GPT-3 is that this one is open-source, which means you have access to its code and even pre-trained models to play with directly. Another significant fun fact is that OPT’s training used 1/7th the carbon footprint as GPT-3, which is another step in the right direction. You can say that this new model is very similar to GPT-3 but open-source and better for the environment.
So a language model using transformers, which I covered in videos before, that was trained on many different datasets, one could say on the whole internet, to process text and generate more text. To better understand how they work, I’d again refer you to the video I made covering GPT-3, as they are very similar models.
Here, what I really wanted to cover is Meta’s effort to make this kind of model accessible to everyone while putting a lot of effort into sharing its limitations, biases, and risks. For instance, they saw that OPT tends to be repetitive and get stuck in a loop, which rarely happens for us; otherwise, no one will talk with you. Since it was trained on the internet, they also found that OPT has a high propensity to generate toxic language and reinforce harmful stereotypes. Basically replicating our general behaviors and biases. It can also produce factually incorrect statements, which is undesirable if you want people to take you seriously. These limitations are some of the most significant reasons these models won’t replace humans anytime soon for important decision-making jobs or even be used safely in commercial products.
I invite you to read their paper for their in-depth analysis of the model’s capacity and better understand their efforts in making this model more environmentally friendly and safe to use. You can also read more about their training process and try it yourself with the publicly available code! All the links are in the references below.
Such open-source contributions with new models, documentation, and code available are really important for the research community to advance science, and I am glad a big company like Meta does that. Thanks to them, researchers from around the world will be able to experiment with state-of-the-art language models instead of smaller versions. I’m excited to see all the upcoming advancements it will create, and I’d love to see what you guys do with it.
I hope you enjoyed this week’s article which was a bit different from usual, covering this exciting news and essential efforts to share publicly available research.
I will see you next week with another amazing paper!
►OPT's video: https://www.youtube.com/watch?v=Ejg0OunCi9U
►Zhang, Susan et al. “OPT: Open Pre-trained Transformer Language Models.” https://arxiv.org/abs/2205.01068
►My GPT-3’s video for large language models: https://youtu.be/gDDnTZchKec
►Meta’s post: https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/
►Code: https://github.com/facebookresearch/metaseq https://github.com/facebookresearch/metaseq/tree/main/projects/OPT
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/