Conversation state across turns lives somewhere. Where you put it shapes latency, cost, scale, and capability. Three layers — in-memory, fast KV, persistent — each fit different requirements. Getting this wrong creates lag, lost context, or runaway memory.

Advertisement

In-memory per process

Simplest. Conversation lives in agent runtime memory. Works for single-server, small-scale. Breaks on restart (lost state) and horizontal scaling (server affinity needed). Fine for hackathons; not production.

Fast KV (Redis, DynamoDB)

Conversation state serialized + stored. Read/write per turn (5-20ms added latency). Survives restarts. Scales horizontally. TTL for expiry. The production default for chat agents.

Advertisement

Append-only event log

Each turn is an event written to a log (Kafka, EventStore). Current state = fold of events. Auditable, replayable, debuggable. Higher write cost; lower for queries. Right for compliance-sensitive domains.

What to store

Full transcript: simple, expensive (long convos blow up). Summarized recent + verbatim last N: balanced. Structured state (extracted facts, current intent) + transcript reference: clean separation but more code. Pick by recall needs.

Cleanup policy

Conversations end. Define end-of-session (idle > 30 min, explicit end, user logout). Archive ended conversations to cold storage if you need long-term retention. Don't pile active session data forever; costs and discoverability both suffer.

Fast KV for default. Event log for compliance. Summarized + verbatim N for token budget. Explicit cleanup policy.