I Published 42 Shorts on AI Terms in 42 days

Because Prompting Was Never the Real Problem

I Published 42 Shorts on AI Terms in 42 days

If you are reading this, it’s either because you use too many AI tools like ChatGPT, Gemini, or Claude for various things, or because the person who sent you this thinks so.

Either way, you’ve probably had the same experience most people have with these tools. You ask something simple, you get an answer, and something feels off. Sometimes it’s vague. Sometimes it’s slow. Sometimes it sounds confident, polished, and authoritative, and then you realize it’s simply wrong. 

The frustrating part is not that it fails. It’s that it fails inconsistently. 

You can ask almost the same question twice and get two different levels of quality, as if the tool is brilliant one minute and confused the next.

When that happens, the standard advice kicks in. Prompt better. Rephrase. Be more specific. Add constraints. Try again.

Clear communication helps, but prompting isn’t the real issue here. We are all prompt engineers. It’s just the surface. The deeper problem is that most people are using systems they do not understand. Try using a bike properly without understanding that pressing the pedals makes the wheels turn.

These tools feel conversational, so we instinctively treat them like a conversational partner. But their intelligence is not human-shaped. It is uneven. Jagged. It spikes in some areas and collapses in others. A model can write a beautiful explanation, help brainstorm a strategy, or draft something that looks like a finished piece of work, and then fail confidently at a simple logical step. It can explain a complex topic in plain language, and a few lines later invent details that never existed.

If you don’t understand what kind of system you’re interacting with, you can’t predict when it will be reliable and when it will surprise you. And that’s the problem I wanted to solve, because it affects everyone, not just engineers. These tools are already part of daily work for a massive number of people, and they are rapidly becoming part of the default workflow for most knowledge jobs.

So I built something that looks almost too simple to matter.

A vocabulary.

I published a free YouTube course called “Introduction to AI in 42 terms”. It’s 42 shorts, each covering one concept that keeps showing up anytime someone tries to explain LLMs and generative AI. The entire series is already live. The point isn’t to make you memorize definitions. The point is to give you a mental model of how these systems operate so you can use them deliberately instead of guessing your way through them.

Start the mini-course here:

Because once you have that mental model, everything changes. The weird behaviour stops feeling random. The strengths stop feeling magical. The failures stop feeling personal. Most importantly, you know what to do next.

Most confusion around AI comes down to missing language. People talk about tokens, embeddings, parameters, hallucinations, fine-tuning, retrieval, agents, alignment, and guardrails as if everyone already knows what those words mean. If those terms are fuzzy, everything built on top of them stays fuzzy too. You can’t tell why an answer failed. You can’t tell why one model feels better than another. You can’t tell why adding a document suddenly makes the output more accurate. You can’t tell whether an AI-powered product is genuinely solid or just a clever wrapper around a chat window.

When people don’t have vocabulary, they end up with superstition. They develop rituals. They collect prompt templates. They copy whatever phrasing worked for someone else in a different tool with different hidden rules. And then they’re surprised when it doesn’t translate.

Spoiler: 99.9% of “prompt templates” are useless.

That’s why I’m not interested in teaching prompting tricks, and never have been. Prompting techniques keep changing as models evolve. The foundations don’t. Models improve every year, but they are still built on the same core ideas, trained in similar ways, and limited by the same underlying constraints that won’t change anytime soon. If you understand those constraints, you’ll keep benefiting even as interfaces and features change.

At the core of all of this is a simple truth that most people still don’t internalize.

A large language model is a system that predicts what comes next.

When you type a question into ChatGPT, it doesn’t look up an answer in a database. It doesn’t “know” things in the human sense. It takes the text you gave it and repeatedly asks one question: given everything I have seen so far, what is the most likely next token? That token might be a word, part of a word, a number, or punctuation. It predicts one, adds it to the sequence, then predicts the next, and repeats until a response appears. You are watching a next-token prediction machine run in real time.

This explains why rephrasing matters. A small change at the start changes the probability landscape of what comes next, and that can push the model down a completely different path. It also explains why these systems can sound correct when they’re not. Fluency and truth are not the same thing. The model is optimized to produce plausible text, not to verify facts.

Once you understand that, the other concepts click into place. Tokens matter because everything is measured in tokens, including limits, speed, and cost. Embeddings matter because the model cannot process words as symbols, it processes numbers that represent patterns of usage. The context window matters because the model only “remembers” what fits inside it in that moment. If something falls outside the window, it doesn’t exist for the model, no matter how important it was to you. This is why conversations degrade, why constraints drift, and why you sometimes feel like the model “forgot” something obvious. It didn’t forget. It just doesn’t have access to it anymore. (We define all these in the free course!)

From there, the course moves into why assistants behave the way they do. A base model is trained to continue text. That’s it. To get something like ChatGPT, you train it further so it follows instructions and answers in a way humans prefer. That’s where instruction tuning and techniques like RLHF come in. They don’t magically make the model understand truth. They shape behavior, tone, and helpfulness. They teach it what humans tend to reward in an answer.

Then you hit the practical reality that matters for anyone using these tools in the real world: models fail, and you need ways to reduce those failures. Hallucination is not a bug you patch with a better prompt. It’s a consequence of how the system works. And we cannot change that. If the model doesn’t have reliable information in context, it will still produce something, because that’s what it’s designed to do. So the solution is often not “ask nicer”. The solution is to ground it in a source of truth. That’s where retrieval-augmented generation (RAG) comes in. You retrieve relevant information from documents or the web, inject it into the prompt, and force the model to generate based on that evidence rather than its internal patterns alone.

This is also where agent systems appear, and where things get dangerous fast if you don’t know what you’re doing. The more you let a model browse, retrieve, and act (on your behalf), the more you have to treat everything it reads as untrusted input. Prompt injection is basically social engineering for machines, and if you’re building systems that pull in external text, you need real defenses, not vibes.

And then there’s the piece that almost everyone ignores until something breaks: evaluation. People love model scores and leaderboards. But benchmarks don’t tell you whether your application will behave well. That’s why builders measure metrics like faithfulness and relevance, and increasingly use techniques like LLM-as-judge to evaluate outputs at scale. When reliability matters, you don’t just hope the model behaves. You measure it. You layer mitigations. You design for failure modes.

This is the central idea behind the entire series. AI doesn’t become reliable by magic. It becomes reliable by design.

If you’re a user, not a builder, that design mindset still matters because it changes how you delegate tasks. You start seeing where LLMs shine and where they shouldn’t be involved at all. You learn when to ask for creativity and when to demand grounding. You learn when to use a model, when to use a tool like a calculator, and when the safest move is to not use AI in the loop.

That’s what I want people to walk away with after finishing the 42 terms (about 1 to 1.5 hours). Not excitement. Not fear. Judgment.

The best outcome isn’t that you can get better answers from ChatGPT. The best outcome is that you understand the difference between its intelligence and yours, and you stop giving it jobs it was never designed to do.

The side effect will be better answers, greater trust, and fewer fears about AGI.

So yes, the series is a free YouTube course. But the deeper purpose is to make AI feel less like a risky shortcut and more like a tool you can use deliberately and confidently.

If you want to go through it, you can start anywhere from the list of 42 shorts here, but I recommend starting at the beginning because the early concepts remove the most confusion.

One last thing.

If you’ve been around AI long enough, you probably remember a term you kept hearing that nobody explained clearly. A word that was used as if it were obvious, even though it wasn’t.

What was that term for you, so I can add it to the course?

Check out the 42 videos for free here: