How to Build a Memory Your AI Agents Can Actually Reuse
The useful part is not giving agents more context. It is making your research, notes, and sources available again in the next session.
Most agent workflows still have the same annoying problem.
You teach the agent something useful, it works well for one session, and then the next time you start again from zero.
Same links.
Same PDFs.
Same notes.
Same explanations.
Same “please remember my style and constraints.”
Very productive. Very 2026. haha.
This is the problem Paul Iusztin and I focused on in our AI Engineer World’s Fair Online Track talk, which was selected as a keynote. Super cool surprise, but more importantly, I think the topic is one of the most useful things AI engineers need to understand right now.
You can watch the full keynote here:
The main idea is simple: the bottleneck is not giving the model more information.
The bottleneck is reusing that information later.
I have notes in Obsidian, Readwise highlights, GitHub repos, meeting recaps, old video research, articles, and random saved links I definitely planned to revisit. Paul has thousands of notes too.
But when I start a new project, I do not want to paste my whole second brain into Claude Code, Cursor, or Codex. That would be expensive, slow, and honestly pretty bad.
The better pattern is to keep memory outside the model.
So we built an AI Research OS based on plain files and references. No vector database required. No knowledge graph to maintain. Just a structure agents can inspect, query, and update.
It has three layers.
First, the raw layer.
Every source stays untouched. If the agent ingests an article, repo, transcript, note, or highlight, the original version is preserved. This matters because summaries lose details. When accuracy matters, the agent can always go back to the source.
Second, the index.
This is the map. The agent reads an index.yaml file that tells it what exists, where it lives, what each source is about, and which derived notes point back to it.
Instead of loading everything, the agent starts from the catalog, reads summaries, then opens only the files it needs.
Third, the wiki.
This is where the memory starts becoming useful. The system turns sources into concepts, comparisons, entities, notes, and open questions.
For example, if you ingest a few open-source coding agent repos, it can create notes about permission flows, sandboxing, tool registries, sub-agents, memory systems, and then compare the architectures across repos.
And when you ask new questions, the wiki can grow. New concepts, new comparisons, new notes.
That is the part I care about most.
A normal research document is static. Useful once, then slowly stale.
A file-based wiki can keep compounding as you work.
Of course, vector databases and RAG systems still make sense for production products. We use them at Towards AI, especially for our AI tutor. But for a personal research workflow, I want something I can inspect by hand.
I want to open the folder, read the files, edit a bad summary, remove a weak source, and understand why the agent answered something.
That is much harder when everything is hidden behind embeddings and database calls.
So the useful lesson from the keynote is this:
Do not make the model remember everything.
Make the agent know where to look. (typically named skills or here a “Wiki”)
That same idea is also what we’ll cover in more depth during my in-person AI Engineer World’s Fair workshop in San Francisco on June 29 with Omar and Samridhi from Towards AI.
That workshop is about context management and memory for long-running agents, using the open-source AI tutor we built for Towards AI Academy.
The tutor has to answer based on our course content, the student’s current lesson, the student profile, long debugging sessions, and code. And it still has to be fast enough to use.
That means every context decision matters: what to retrieve, what to keep, what to remove, when to compact, when summarization helps, when caching makes summarization a bad idea, and how to avoid old tool outputs quietly filling the entire context window.
I recorded my part of that workshop early for paid subscribers, which you can join here: https://louisbouchard.substack.com/subscribe
The short version is: agents do not need infinite context.
They need better memory, better retrieval, better compression, and better systems around them.
That is where the real engineering is.
As a teaser, here’s the experiment website for this upcoming workshop: huggingface.co/spaces/towardsai-tutors/context-engineering-experiments