How to Really Build on Top of LLMs

(full training session) (typical path for companies)

Louis Bouchard

Jul 4, 2025 • 2 min read

Here's our free second session on building applications with Large Language Models (LLMs) from our recent 10-hour video course.

During this presentation, we explored how to overcome their limitations and harness their power through techniques such as RAG, fine-tuning, and structured outputs, following the typical path a company should take.

If you missed it (or if you want to watch it again), check it out on YouTube here:

If you don’t have 2 hours to spare… here are 10 key takeaways from this session, focused on building and customizing with LLMs:

Before diving into fine-tuning (often costly and complex), prioritize prompt optimization, then RAG (Retrieval-Augmented Generation) to address LLM challenges like hallucinations or knowledge gaps.
Start with “zero-shot” and “few-shot” prompting. If that isn’t enough, move on to basic RAG, then explore advanced RAG techniques before considering more specific fine-tuning of the encoder or generator.
RAG is essential for injecting external information into your LLMs. Advanced RAG techniques improve relevance through smart data chunking, the use of metadata, integrating vector databases for efficient semantic search, and hybrid search strategies.
Beyond Retrieval: Once information is retrieved, refine it by re-ordering, ranking by relevance, removing duplicates, and summarizing or enriching it if necessary. Also consider verification loops, rewriting user queries for clarity, and integrating multiple modalities (images, audio, etc.).
Fine-Tuning: When and Why? Fine-tuning is a powerful option, but use it judiciously—for highly specific tasks where a substantial dataset already exists, to drastically optimize performance, or to adapt the LLM to new knowledge domains. RLFT (Reinforcement Learning from Feedback-based Fine-Tuning) is particularly suited to refining model behavior based on user feedback and improving the overall experience.
Make Responses Reliable with Structured Outputs: Given the unpredictability of raw LLM answers, structured outputs (e.g., JSON, XML) are a key approach. They guarantee consistent formatting, simplify automated processing by eliminating manual parsing, and increase reliability when integrating with other systems.
Implementing Structured Outputs—Methods and Tools: Two main approaches exist: training the model combined with explicit prompting to follow a given schema, or using grammar-based constraints (Context-Free Grammar) that restrict the tokens the model can generate at each step. Tools like Pydantic (Python), Zod (JS/TS), or libraries such as Outlines greatly simplify this implementation.
Choice Strategy (Selecting the Right Model and Approach): Your specific needs should guide the choice of LLM and customization technique. Choose models like Gemini for very long contexts, reasoning-oriented models (such as o3) for complex multi-step queries, or lighter models (4.1-mini, Gemini Flash) for simpler tasks. RAG is often a good choice for interacting with large databases while controlling costs.
Cost and Latency Optimization with CAG (Context Caching): For use cases involving repeated analysis of long documents (codebases, reports), “Context Caching” (offered by Gemini and OpenAI) is an effective technique. It stores already processed tokens to reduce costs and latency for subsequent queries.
Evaluation (The Pillar of Any LLM Implementation): Whether you use advanced prompting, RAG, fine-tuning, or structured outputs, the key to success lies in rigorous, continuous evaluation. Measure quantitatively each step and component of your pipeline to identify areas for improvement and iterate effectively.

That’s it! Watch the full video to gain even more insights on these points and much more ;)

P.S. We also shared the first session of our course for free last week! Start with that one if you also missed it:

P.S.2. If you enjoyed these two sessions, you’ll love the four others in the full course. Sign up here for only $199 and use code “tai_community” to get 15% off right now: https://academy.towardsai.net/courses/llm-primer?ref=1f9b29

Sign up for more like this.