Giving an AI Agent a Memory: Building a Self-Curating Knowledge Base for Game Development
Giving an AI Agent a Memory: A Self-Curating Knowledge Base for Game Development
Why I stopped letting my AI assistant re-learn the whole codebase every morning — and built it a memory that curates itself.
The problem I kept hitting
On a fast-moving commercial Unity mobile-game project, the hardest part of plugging in an AI assistant wasn't the model — it was the context. Every session, the assistant started from zero: re-reading the codebase, re-deriving how a feature worked, often landing on a slightly different answer than yesterday. Meanwhile the actual knowledge was scattered across three places that quietly contradicted each other — daily commits, code-review notes, and ticket specs — and the game designers, who couldn't read C#, still needed to know "what does this feature touch, and is this request even feasible?"
I didn't want a smarter prompt. I wanted the system to have a memory that stayed current on its own.
The bet: three systems, one knowledge flywheel
Instead of one monolith, I split the work across three cooperating systems:
- Game Source — the game client. It only produces material: a daily code review that emits pure "code does X + file:line" facts, with no opinions.
- Game Agent — the brain. It curates that raw material into a knowledge base, answers feasibility questions, and is deliberately kept outside the game repo so the whole mechanism is portable to the next project.
- Task — a self-built ticketing system. Requests the agent works out flow into it as tickets; completed tickets flow back into curation.
That last loop matters: specs feed in, completed work feeds back, and the knowledge base gets stronger the longer the project runs. A flywheel, not a snapshot.
The non-obvious trade-offs
A few decisions went against my first instinct, and those are the ones worth writing down.
I didn't use RAG. The reflexive move is to chunk everything, embed it, and retrieve by similarity. I tried that on a predecessor system and it was a poor fit for game-dev data: structured specs got flattened into soup, superseded discussion got confused with final decisions, and "who decided this" became guesswork. So I moved the hard work from query-time to write-time. When the agent curates, it preserves structure and writes clean, human-readable knowledge pages — each with machine-readable frontmatter, a plain-prose reading layer, and footnotes citing exact file.cs:line. One file, three audiences, no forking.
The LLM never infers authority. Knowledge from a real code review (Tier 1) always outranks a ticket spec (Tier 2), regardless of date — a two-month-old ticket should never overwrite a fact I just read from live code. But the key discipline is that humans and rules decide the tiers and the "final decision" status; the model only executes and writes structured content correctly. It never guesses which source wins. That makes the whole pipeline predictable and auditable — when something's wrong, it's a rule that's wrong, not the model's mood that day.
I built my own ticketing instead of reaching for Jira or Trello. Game-content tickets are highly custom and need to plug into the agent flow. I kept the ticketing system intentionally minimal and fully decoupled — it holds no LLM logic at all. All the intelligence stays on the agent side; the ticketing layer just persists and tracks work.
There's also a routing index the agent uses for retrieval. I made it impossible to hand-edit: it's regenerated from each page's frontmatter, and CI fails if it drifts. A routing table that can rot silently is worse than no routing table.
Making it runnable
Design notes are easy to wave at; running code is harder to fake. So alongside the write-up, I packaged a de-identified starter kit — three folders mirroring the three systems, with the portable pieces actually runnable: regenerate the routing index from sample knowledge pages, dry-run the config deploy, run the daily code-review collector. The content is entirely synthetic; the mechanism is real.
What I'd take to the next project
The lesson that generalized best wasn't a clever prompt or a framework. It was this: the scarce resource in an agent system is the signal-to-noise ratio of what enters the knowledge base. I once planned to auto-convert design spreadsheets into AI-readable docs, measured a ~30% useful conversion rate, and cut the whole feature. Knowing when not to add a source turned out to matter as much as any algorithm.
Give your agent a memory — but be ruthless about what you let it remember.
The design showcase lives at GamePlusAIAgent (de-identified). A runnable starter kit mirrors the three systems in a separate repo.
