Skip to main content
Logo
Overview

pgvector vs Pinecone vs Qdrant vs Weaviate in 2026

May 2, 2026
11 min read

I’ve now had the “do we really need a separate vector database?” conversation enough times that I can recite both sides in my sleep. Two years ago the answer was usually “yes, get Pinecone, move on.” In 2026 it’s the opposite for most teams I talk to, and the reason is boring: pgvector got good enough, and Postgres 18 made the underlying engine quietly faster.

That’s the headline. The rest of this post is the messy version — what the benchmarks actually say, where pgvector falls over, when Qdrant earns its keep, and why Pinecone keeps showing up on bills nobody questioned.

The 2026 landscape, in one paragraph

pgvector 0.9 shipped earlier this year with sparse vector support, IVFFlat improvements, and another round of HNSW speedups. Postgres 18 has been generally available since late 2025, and its async I/O subsystem can roughly triple read throughput on the kinds of indexed lookups vector search produces. Qdrant still wins single-query latency at small-to-medium scale and tops the public ANN-Benchmarks at around 1,840 QPS on 1M vectors. Pinecone went all-in on serverless and now starts at a $50/month Standard minimum but climbs fast under real load. Weaviate keeps pitching the hybrid vector + keyword + knowledge-graph angle, which is actually useful if you need it and overkill if you don’t.

That’s the whole landscape. Now the interesting part.

pgvector 0.9 + Postgres 18: the new default for most apps

If you’re already running Postgres — and almost everyone is — the marginal cost of adding pgvector is CREATE EXTENSION vector; and a few hundred MB of disk. There’s no new service to monitor, no new IAM model, no separate backup story, no second on-call surface. That alone wins arguments.

The technical case caught up to the operational case in 2026. HNSW indexes have been in pgvector since 0.5; 0.6 added parallel index builds; 0.7 brought scalar and binary quantization; 0.9 added sparse vectors, which matters for hybrid search and for some sparse-embedding models that were previously a pain to host alongside Postgres. Combined with Postgres 18’s async I/O, the throughput you can get on a single mid-size RDS or self-hosted box is genuinely surprising the first time you measure it.

And then there’s pgvectorscale (Timescale’s extension layered on pgvector). At 50M vectors and 99% recall, pgvectorscale benchmarks at roughly 471 QPS versus Qdrant’s 41 QPS on the same dataset. At 90% recall on 50M embeddings, the gap is around 1,589 QPS for Postgres vs 360 for Qdrant. Those numbers caught a lot of people off guard, including me. If you’d told me in 2024 that Postgres would be the throughput leader at 50M vectors I’d have laughed.

The honest caveat: pgvectorscale is doing serious work to make this happen (StreamingDiskANN, statistical binary quantization), and the comparison only holds on workloads where its index strategy fits. Vanilla pgvector without pgvectorscale is fine for most apps under 10M vectors but doesn’t scale the same way past that.

Where pgvector still hurts

  • P99 latency under heavy concurrent writes. Postgres write contention shows up sooner than dedicated vector DBs.
  • Multi-tenant isolation. Doable with row-level security or schema-per-tenant, but neither is as clean as Pinecone’s namespaces or Qdrant’s collections.
  • Operational ceiling. Once you cross ~50M vectors with strict latency requirements, you’re either reaching for pgvectorscale or admitting Postgres isn’t the right tool.

Qdrant: still the QPS king for the small stuff

Qdrant remains the fastest vector database by single-query latency on small-to-medium workloads. ANN-Benchmarks puts it at ~1,840 QPS on 1M vectors. On apples-to-apples latency: about 30.75ms p50, with p95 around 39% better than the next contender and p99 roughly 48% better. That tail-latency win matters more than a lot of people realize, because RAG endpoints almost always live under a user-facing latency budget.

It’s also Rust, which has been a reliability boon — I’ve run Qdrant in production for two different projects without an unexpected restart. Filtering performance is genuinely good (it’s not bolted on the side like some others), the gRPC API is fast, and the self-hosted option is a single binary. Qdrant Cloud sits in the $600-1,200/month range for serious workloads, which slots in neatly between “free pgvector on the DB I already have” and “Pinecone on a corporate card.”

The thing Qdrant isn’t great at: it’s another database. Backups, replication, schema migrations, version upgrades — all your problem. The managed cloud handles most of it, but you’re paying for that.

When I’d reach for Qdrant: the application has user-facing latency requirements under ~50ms p99, the dataset sits in the 1M-50M range, you need fast filtered search with rich payload conditions, and you’re okay running another piece of infrastructure.

Pinecone: paying for someone else’s pager

Pinecone’s pitch is and always has been “we run it, you don’t think about it.” That’s worth real money for some teams. The 2026 reality is that they fully committed to serverless — no idle compute charges, billed in read units, write units, and storage. The Standard plan starts at $50/month minimum, with storage at $0.33/GB/month, $4 per million write units, and $16 per million read units.

That sounds reasonable until you actually model real traffic. A medium-sized RAG app with 1M vectors and modest query volume runs $50-100/month, which is fine. A production app with multi-tenant data, frequent reindexing, and a few hundred QPS climbs into four figures fast. I’ve seen teams hit $1,500-3,000/month on what they assumed would be a “couple hundred bucks” workload.

The lock-in is also real. Pinecone’s API isn’t standard. Migrating off means rewriting your retrieval layer. That’s not catastrophic but it’s not free either. And the serverless model makes cost forecasting harder than the old pod-based pricing did, because read-unit consumption depends on filter complexity in ways that are hard to predict before you ship.

When Pinecone is genuinely the right answer: a small team that does not want to operate any vector infrastructure, has bursty traffic that benefits from serverless scaling, and isn’t price-sensitive. That’s a smaller set of teams than Pinecone’s marketing implies, but it exists.

Weaviate: when you actually need hybrid

Weaviate’s positioning has always been “more than just vectors.” Hybrid search (BM25 + vector), built-in modules for embeddings and generation, the knowledge-graph-ish references between objects. If you genuinely need that — for example, a search system where keyword precision matters as much as semantic similarity, or where you’re modeling structured relationships across entities — Weaviate earns its complexity.

If you don’t, you’re paying a complexity tax. Schemas, modules, references, classes — there’s a lot to learn before you’re productive, and most of it is unnecessary for a basic RAG pipeline. Weaviate Cloud pricing is roughly comparable to Qdrant Cloud for similar workloads, with the standalone open-source version free to self-host.

I’ve been on two projects that picked Weaviate. One of them genuinely needed hybrid search and the team was happy. The other picked it because of marketing and quietly migrated to pgvector six months later. Pattern recognition, etc.

The benchmarks people actually quote

A grain of salt first: vendor benchmarks are vendor benchmarks. Even ANN-Benchmarks runs vary by hardware, dataset, recall target, and filter complexity. With that out of the way, the rough 2026 picture:

  • Qdrant: ~1,840 QPS at 1M vectors. Best single-query latency at small-to-medium scale. p99 around 38ms on standard 1M benchmarks.
  • pgvectorscale: ~471 QPS at 50M vectors / 99% recall vs Qdrant’s ~41 QPS on the same workload. ~1,589 QPS at 50M / 90% recall vs Qdrant’s ~360.
  • Pinecone: harder to benchmark fairly because it’s serverless. Real-world latencies are competitive at small scale, get noisier under load.
  • Weaviate: usually middle of the pack on raw QPS, competitive on hybrid search where the comparison gets messy.

Take any specific QPS number with skepticism. Run your own benchmark on your actual data with your actual filter patterns. The headline ranking — Qdrant fastest at small scale, pgvector + pgvectorscale strongest throughput at large scale — tends to hold across configurations.

RAG vs AI agents: different workload, different priorities

A standard RAG pipeline does mostly read-heavy similarity search with relatively low query rate per user. For that, almost anything works.

AI agents are different. Agent memory means writes happen continuously as conversations progress. Tool selection means many small, filtered queries per agent step. Long-running agents need their memory indexed in close-to-real-time. The “MCP server for vectors” pattern that emerged in 2025 leans heavily on filter performance and write throughput.

For agent workloads I’d weight things differently:

  • Filter performance matters more than raw QPS. Every agent query is filtered by user, conversation, tool category.
  • Write latency matters more than for RAG. New memories should be queryable within seconds, not minutes.
  • Cost predictability matters because agents do many more queries per session than chatbots.

That math nudges away from per-read-unit pricing models like Pinecone and toward predictable infrastructure costs (pgvector on existing Postgres, Qdrant on a fixed-size VM). It’s not a hard rule but it’s a real pattern.

Pricing math, with the asterisks

Rough monthly cost for a 5M-vector production app, ~50 QPS average:

  • pgvector: $0 marginal if you’re already running Postgres at the right size; otherwise add the cost of upsizing your DB instance — call it $200-400/month delta on RDS.
  • Qdrant Cloud: roughly $600-1,200/month depending on cluster size and replication.
  • Pinecone Serverless: probably $300-800/month on the Standard plan, but it’s load-dependent and I’ve seen the same nominal workload come in 2x apart on different months.
  • Weaviate Cloud: similar order of magnitude to Qdrant Cloud.
  • Self-hosted Qdrant or Weaviate on a VM: a single $80-150/month box handles a lot before you need to scale.

The pgvector zero-marginal-cost story is the strongest argument in favor of “just use Postgres” for most teams.

Operational realities nobody talks about

The benchmarks and pricing pages make this look like a database choice. It’s actually an operational choice.

  • Backups: pgvector inherits whatever your Postgres backup story is (probably good). Pinecone handles it for you. Qdrant and Weaviate self-hosted need their own snapshot strategy.
  • Replication: same pattern. pgvector is solved if you’ve already solved Postgres replication. Managed offerings handle it. Self-hosted means you’re configuring it.
  • Index rebuilds: HNSW index builds on big tables are not fast. Plan for hours, not minutes, on multi-million-row datasets. This bites teams that don’t think about it.
  • Schema migrations: pgvector wins by being inside Postgres — you migrate vector schema with the same tools you migrate everything else. Specialized vector DBs each have their own story.
  • On-call surface: every dedicated vector database is one more thing that can page you at 3 AM.

The cost of “one less service to operate” is the most underrated factor in this entire decision.

The decision framework I’d actually use

  • Solo builder, side project, or MVP: pgvector. Even if you don’t have Postgres yet, install it. The blast radius if you’re wrong is tiny.
  • Series A startup, ≤10M vectors, RAG-heavy: pgvector unless you have a specific reason against it.
  • Series A startup, AI agents with continuous memory writes: pgvector or self-hosted Qdrant. Skip Pinecone for cost predictability.
  • Multi-tenant SaaS, ≤50M vectors total: pgvector with row-level security if you trust your isolation; Qdrant Cloud with collections-per-tenant if you want clean separation.
  • 100M+ vectors with sub-10ms p99: pgvectorscale, or Qdrant at the high end of its hardware. This is the zone where the dedicated tools earn their keep.
  • You explicitly need hybrid (BM25 + vector) or knowledge-graph features: Weaviate.
  • No engineering bandwidth for any infrastructure work: Pinecone, eyes open about cost.

What I’d actually pick

If someone asked me today and I had to pick once, for a typical 2026 production AI app: pgvector on Postgres 18, with pgvectorscale ready to add when you cross ~25M vectors. Qdrant if filter-heavy or latency-strict. Pinecone only if the team has explicitly said “we will not run any infrastructure,” and even then, knowing the bill will surprise you eventually.

The vector database market spent 2023 and 2024 looking like it would consolidate around dedicated services. 2026 looks like it’s consolidating the other direction — toward Postgres extensions for the long tail and dedicated tools at the demanding end. That’s a healthier outcome than what the early hype suggested.

If you want to test this for yourself: spin up a Postgres 18 container, CREATE EXTENSION vector;, load 100K of your real embeddings, and run your actual query patterns against it. Half the time the experiment ends right there.

Sources used while writing this post: