An agent without memory restarts every conversation. Production agents need three memory tiers: short-term (current conversation buffer), long-term (vector store of past interactions), structured (entities and facts in a graph or DB). Each serves a different recall pattern.
Short-term: conversation buffer
Last N turns of the current conversation, included verbatim in the prompt. Bounded by context window. Once full, summarize old turns and keep summary + last 5 raw turns. Standard pattern: ConversationSummaryBufferMemory in LangChain.
Long-term: vector store
Embed each conversation turn or extracted fact, store in vector DB keyed by user_id. At query time, retrieve top-K relevant past memories. Use case: 'remember the user said they're vegetarian' three weeks ago when planning a recipe.
Structured: entity memory
Extract named entities (people, places, projects) into a graph or relational store. Each entity has properties accumulated over time. Use case: 'what are all the projects user X has mentioned, and which ones are deadline-soon?' Vector store can't answer that; structured can.
Memory consolidation
Periodically (end of session, nightly batch), an offline process reads recent interactions, extracts new entities/facts, and updates long-term + structured stores. Without this, vector memory becomes a noisy log; structured memory stays empty.
Privacy
All memory must support delete-by-user_id (GDPR). For vector store, this means index-by-user with namespace isolation. For graph memory, cascade-delete on user removal. Bake this in from day one.