memnode
Sign InSign Up
Back to Articles
Featured

Vector embeddings are the wrong default for AI agent memory

Three concrete failure modes from production agents that drove me to stop reaching for a vector DB first: drift hallucinations, awkward writes, and using 1536-dim floats to index 200 bytes of state. When structured KV plus MCP memory servers actually fits, and when vector DBs do.

memnode8 min read
architecturevector-dbmemoryagentsmcp

The default answer to "how do I give my LLM agent memory" has, for about two years now, been some variant of "set up a vector DB and embed conversation chunks." I've shipped that pattern across half a dozen production agents and watched it underperform every single time. This post is about why structured key-value recall plus MCP-style memory servers usually beats vector embeddings for the actual jobs agents need memory for, and the specific failure modes that drove me to stop reaching for a vector DB first.

The thesis is narrower than it sounds. Vector embeddings work well for what they were designed for: semantic retrieval over large unstructured corpora. They are routinely misapplied to agent memory because the tooling is mature and the conference-talk version of agent architecture says "RAG everything." What agents actually need most of the time is something simpler and harder: structured persistent state with reliable, consistent recall.

Three concrete failure modes I've hit across deployments.

Failure 1: high-recall queries that drift

The classic vector-DB memory pattern is: embed every conversation message, retrieve the top-k most similar to the current turn, stuff them into the system prompt. This works in toy demos. In production it produces what I think of as "drift hallucinations", the agent confidently cites facts from a vector hit that's only loosely related to the current turn.

Concrete example. A customer-support agent had vector memory of past tickets. User opens a new ticket about "I can't log in." Vector search returns three past tickets containing "login", including one from a year ago about a different feature that was renamed. The agent confidently tells the user to "check the workflow flag, which solved this in your previous ticket." The flag had been deprecated for nine months. The user spent twenty minutes looking for a UI element that no longer existed.

The vector DB did exactly what it was designed for. It retrieved semantically-similar items. The problem is that "this user's last login attempt was five days ago" and "this user's deprecated workflow flag was relevant in February 2024" are both valid retrievals, and the agent has no way to distinguish "current relevant fact" from "historical artifact."

A structured KV store with explicit recency markers and explicit fact lifecycle ("this fact was true on 2024-02, this fact is current") doesn't have this failure mode. You query by key, not by similarity. If the fact isn't current, it isn't there.

Failure 2: small, high-frequency state

Most agent memory fits in a few hundred bytes per user. The user's name, current preference settings, last few interaction summaries, ongoing project context. A vector DB is wildly the wrong tool for this scale of data. You're storing 1,536-dimension floats to index 200 bytes of structured state, paying 30x the storage and 100x the query latency to retrieve information that a single key lookup would return in microseconds.

This sounds obvious and yet I keep finding production agents where the single most-queried memory fact ("what is the user's preferred response format?") is sitting behind a vector search instead of behind a key.

Vector DB defenders will say: "well, you'd just store that as metadata on the embedding." Sure. Now you have a key-value store with extra steps and a vector index you don't need.

Failure 3: writes are messier than reads

Agents need to write to memory often: every time the user states a preference, every time a fact changes, every time a session wraps up. Vector DBs handle writes, but the lifecycle is awkward.

Take a simple case: the user says "actually, my email is jane@example.com, not jane@example.org as I said before." A vector DB will store both as embeddings. Future retrieval will return both. The agent now has two contradictory facts and no first-class way to mark one as current and the other as historical. You can engineer around this (add a "superseded_by" field, filter on it at retrieval, write your own conflict resolution logic), but at that point you've reinvented a structured database with a vector index awkwardly bolted on.

A structured KV store handles this naturally. user:jane:email gets overwritten. There's only one current value. Old values are explicit history, not retrieval noise.

What "agent memory" actually means

The phrase covers four distinct workloads, and they have different optimal storage:

  • Episodic recall: "What did the user say in the last conversation?", sequential, recency-weighted, often best handled by a sliding context window or a structured session log, not retrieval at all.
  • Semantic recall over a knowledge base: "What does our product do?", this is where vector DBs shine. RAG over docs, code, tickets. Big corpus, fuzzy queries, semantic similarity is the right primitive.
  • Structured state: "What are the user's current preferences?", explicit keys, current values, write-overrides-old. KV territory.
  • Cross-session continuity: "What did the user and I work on together over the past six weeks?", this is the genuinely hard one. Some episodic, some semantic, some structured. Different facts age differently. The naive "embed every turn" pattern fails this hardest.

The mistake is treating all four as one workload and reaching for a vector DB. In practice you want different stores for different memory types, with the agent's tool layer doing the routing.

The MCP-server pattern

The Model Context Protocol gave us a clean way to expose memory to agents without coupling it to a specific store. A memory MCP server is a thin tool layer the agent calls (memory.recall, memory.set, memory.list_recent) backed by whatever storage actually fits the data: KV for structured, vector for semantic, append-only log for episodic.

This pattern has two practical benefits over "embed-and-retrieve":

First, the agent has explicit affordances. Instead of hoping the right embedding gets retrieved, the agent decides "I need to look up the user's stored preferences" and calls memory.recall("preferences"). The decision is in the agent's reasoning, not in the embedding-similarity black box.

Second, you can mix storage backends without changing the agent. The MCP layer hides the implementation. We've shipped one project where structured user state was a Postgres table, episodic session summaries were a Redis sorted set, and semantic doc retrieval was a Qdrant index, all behind one MCP memory server. The agent didn't care.

Memnode (full disclosure, my project) is one such server, focused specifically on the structured + episodic side. There are others. The point isn't memnode specifically; the point is that the MCP-server pattern lets you pick the right storage per workload instead of forcing everything through a vector DB.

When vector DBs are the right tool

To be clear: I'm not saying don't use vector DBs. I'm saying don't make them the default for agent memory. The use cases where vector retrieval is genuinely the best primitive:

  • Document RAG: thousands of docs, fuzzy semantic queries, "what does the policy say about X?"
  • Code search across a large repo: when symbol-name search isn't enough and you want similarity over implementation patterns
  • Long-term semantic memory over conversational corpus: when you genuinely want "find conversations across 6 months that touched on a similar concept", but only if the agent's reasoning explicitly justifies that retrieval, not as a default first-stop lookup

The pattern is: vector DB when the corpus is large, the queries are fuzzy, and exact key lookup wouldn't work because keys aren't known in advance.

If your agent's memory is "small structured state per user" or "the last 50 messages of a conversation," that's not what vector DBs are for.

Concrete: how to know which you need

A diagnostic question that's served me well: write down the five most common memory-related questions your agent asks itself.

If they look like:

  • "What is the user's name?"
  • "What was their last action?"
  • "Which document are we looking at?"
  • "What was the last error message?"
  • "What is the current task state?"

→ KV-style structured memory. Don't use a vector DB.

If they look like:

  • "Find policy text relevant to this question"
  • "Surface code patterns similar to this snippet"
  • "Pull up tickets that resemble this issue"

→ Vector retrieval. Use a vector DB.

If they look like:

  • "What did we discuss about the migration plan three sessions ago?"
  • "Summarize my context across the last week"

→ Hybrid: episodic log + on-demand summarization, possibly with a small vector index over the summaries (not the raw turns). Reach for the MCP-server pattern so you can tune the backend per query type.

What I do now

For new agent projects, my default stack is:

  • Structured user state in Postgres or SQLite, exposed via an MCP memory server with explicit set / recall / delete semantics
  • Episodic session memory as an append-only log, also in the MCP server, with recency-windowed retrieval
  • Vector RAG only when there's a large external corpus and the queries are genuinely semantic

The cost has been lower (no vector DB to operate), the failure modes have been more predictable (agents no longer cite three-month-old workflow flags), and the agents have been easier to debug (you can read the structured memory; you can't read embedding vectors).

If you're starting an agent project today, I'd recommend skipping the vector-DB-first reflex and starting with structured KV + an MCP memory layer. Add vector retrieval when you have a specific workload that justifies it, not as the default architecture.

The default has been wrong for a while. The conference talks haven't caught up.

If you want to see the structured-memory side concretely, the Claude Code memory demo shows the install / record / recall / lineage loop in four steps. If you only need plain key→value lookup without MCP semantics (cache, feature flags, session keys), a generic KV like basekv is enough.

And if you do adopt a managed memory layer instead, go in knowing its ceiling: even the best ones recall the right memory only about two times in three. We break down why memory layers recall the wrong thing and the structural fix.

For the engineering view of what beats top-k similarity, see spreading activation over a typed memory graph and the memnode design series on how a memory engine that does this is actually built.