Vector databases are how RAG systems retrieve relevant chunks. The market is crowded — Pinecone, Weaviate, Qdrant, Chroma, Milvus, plus 'add-on' vectors in Postgres (pgvector) and Elasticsearch. The differences matter when you go past prototype.

Advertisement

Just-vectors vs full-text-augmented

Pure vector DBs (Pinecone, Chroma, Qdrant) excel at semantic search. Hybrid systems (Weaviate, Elasticsearch with kNN, OpenSearch) combine vector + BM25 full-text scoring — often more accurate for technical queries with exact-match terms. For most RAG use cases, hybrid is the right default.

Filter performance

Real queries combine vector similarity + metadata filter (e.g., 'similar docs WHERE author = X AND year > 2024'). Filter performance varies wildly. Pinecone struggles with high-cardinality filters; Qdrant and Weaviate are designed for it; pgvector inherits Postgres index quality.

Advertisement

Operational footprint

DBSelf-hostHostedBest when
PineconeNoYesWant zero ops, low scale
QdrantYes (Rust)YesFilter-heavy queries
WeaviateYesYesHybrid search built in
ChromaYes (Python)BetaPrototyping, embedded
pgvectorYes (extension)RDS/SupabaseAlready have Postgres

Index choices

HNSW: O(log N) lookup, requires ~3-5× memory. IVF: cluster-based, faster build, slightly slower query. Most production deployments use HNSW for query latency. Configure ef_construction ~200, M ~16-32. Tune based on recall@10 measurements on your data.

Cost reality

Pinecone managed pod: $70/mo for ~1M vectors. Self-hosted Qdrant on a $50/mo VM: ~10M vectors. The hosted markup is 5-20× — fine for prototypes, painful at scale. Plan migration path early.

Prototype on Chroma or Pinecone. Production: Qdrant or pgvector self-hosted unless you have zero ops capacity.