Overview
pgvector adds vector similarity search to PostgreSQL, letting you store embeddings from OpenAI, Cohere, Hugging Face, or any other model alongside your relational data.
pgvector is enabled by default on the appdb database created with your cluster. For additional databases, add the pgvector extension through the dashboard UI or Terraform.
Verify the extension
Confirm pgvector is active on your database:
SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';
Store embeddings
Create a table with a vector column
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding VECTOR(1536) -- OpenAI ada-002 dimensions
);
The VECTOR(n) type stores n-dimensional vectors. Common dimension sizes:
| Model | Dimensions |
|---|
| OpenAI text-embedding-ada-002 | 1536 |
| OpenAI text-embedding-3-small | 1536 |
| OpenAI text-embedding-3-large | 3072 |
| Cohere embed-english-v3.0 | 1024 |
| BGE-small-en | 384 |
Insert vectors
INSERT INTO documents (content, embedding)
VALUES (
'Rivestack is a managed PostgreSQL service',
'[0.1, 0.2, 0.3, ...]' -- your embedding vector
);
From Python with OpenAI
import psycopg2
import openai
# Generate embedding
response = openai.embeddings.create(
model="text-embedding-ada-002",
input="Rivestack is a managed PostgreSQL service"
)
embedding = response.data[0].embedding
# Store in Rivestack
conn = psycopg2.connect("postgresql://user:pass@host:5432/mydb?sslmode=require")
cur = conn.cursor()
cur.execute(
"INSERT INTO documents (content, embedding) VALUES (%s, %s)",
("Rivestack is a managed PostgreSQL service", str(embedding))
)
conn.commit()
Query vectors
Similarity search
Find the 5 most similar documents using cosine distance:
SELECT id, content, embedding <=> '[0.1, 0.2, 0.3, ...]' AS distance
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'
LIMIT 5;
Distance operators
| Operator | Distance Metric | Use Case |
|---|
<-> | L2 (Euclidean) | General purpose |
<=> | Cosine | Text embeddings (most common) |
<#> | Inner product | When vectors are normalized |
Create an index
For tables with more than a few thousand rows, create an index to speed up queries:
HNSW index (recommended)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
HNSW provides fast, approximate nearest-neighbor search with good recall. Tuning parameters:
| Parameter | Default | Description |
|---|
m | 16 | Max connections per node. Higher = better recall, more memory. |
ef_construction | 64 | Build-time search width. Higher = better recall, slower builds. |
Set the search parameter at query time. Higher ef_search means better recall and higher latency (the default is 40):
-- Per query — works on every port, including the pooled :6432 (see warning):
BEGIN;
SET LOCAL hnsw.ef_search = 100;
SELECT id FROM documents ORDER BY embedding <=> $1 LIMIT 10;
COMMIT;
-- Or set a persistent default for the whole database (applies to new connections):
ALTER DATABASE appdb SET hnsw.ef_search = 100;
On Solo, a plain session SET hnsw.ef_search is silently ignored. Solo databases connect through PgBouncer in transaction-pooling mode on port 6432, so a bare SET hnsw.ef_search = 100; runs in its own transaction and is reset before your next query — you’ll quietly get the default (40) instead. Always either use SET LOCAL inside the same transaction as the query, or set a persistent default with ALTER DATABASE appdb SET hnsw.ef_search = … (or ALTER ROLE appdb SET …).HA, Growth, and Scale databases use the load balancer on port 5432 (not transaction pooling), so a session-level SET does persist there — but SET LOCAL / ALTER DATABASE is the portable approach that works everywhere.
IVFFlat index
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
IVFFlat requires a training step and works best when you set lists to rows / 1000 for tables up to 1M rows.
SET ivfflat.probes = 10; -- default is 1, higher = better recall
Which index to use?
| HNSW | IVFFlat |
|---|
| Recall | Higher | Lower |
| Build speed | Slower | Faster |
| Query speed | Fast | Fast |
| Memory | More | Less |
| Recommendation | Default choice | Large datasets with limited memory |
RAG example
A complete retrieval-augmented generation (RAG) pattern:
import psycopg2
import openai
def search(query: str, top_k: int = 5):
# 1. Generate embedding for the query
response = openai.embeddings.create(
model="text-embedding-ada-002",
input=query
)
query_embedding = response.data[0].embedding
# 2. Search Rivestack for similar documents
conn = psycopg2.connect("postgresql://user:pass@host:5432/mydb?sslmode=require")
cur = conn.cursor()
cur.execute("""
SELECT content, embedding <=> %s::vector AS distance
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT %s
""", (str(query_embedding), str(query_embedding), top_k))
results = cur.fetchall()
conn.close()
return results
# 3. Use results as context for your LLM
results = search("How does Rivestack handle backups?")
context = "\n".join([row[0] for row in results])
Benchmarks
Rivestack pgvector performance on one Starter node (2 vCPU, 4 GB RAM, local NVMe) with 250,000 vectors (1536 dimensions, HNSW m=16 / ef_construction=64, cosine), measured from a same-region app over PgBouncer (:6432) and TLS:
| Metric | Result |
|---|
| Throughput | ~1,000 QPS at recall@10 0.93 |
| p50 latency | 3.7 ms (4 clients) |
| Scales to | ~1,570 QPS at p50 10 ms (16 clients) |
| Throughput ceiling | ~1,980 QPS at recall@10 0.75 (ef_search=10, 16 clients, p50 8 ms) |
| Network floor | 0.38 ms per same-region round-trip |
Per-tier capacity and throughput
Measured across tiers (1536-dim, HNSW m=16 / ef_construction=64, recall@10 vs exact KNN). Each tier’s honest fast-search capacity is sized so the index fits hot in RAM:
| Tier | RAM | Hot capacity | Throughput (recall@10) |
|---|
| Solo / Starter | 4 GB | ~300K | ~1,000 QPS at 0.93, p50 3.7 ms (4 clients) |
| Growth | 8 GB | ~600K | ~1,716 QPS at 0.93 (16 clients) |
| Scale | 16 GB | ~1M | builds + serves 1M × 1536 at ~2,500 QPS, p50 ~6 ms (16 clients) |
QPS and p50 latency trade off across the concurrency curve — they are not simultaneous best-cases, so each figure carries its recall and client count. recall@10 0.93 uses
ef_search=80; recall@10 0.75 uses
ef_search=10. Pooling and TLS add only ~0.4 ms over the raw same-region network floor (~0.38 ms on Solo via the
:6432 PgBouncer; ~0.91 ms on Growth/Scale via the
:5432 load balancer). A 1M × 1536 index is ~6 GB and does not fit or build on a 4 GB node — use a Scale (16 GB) node for datasets that large, where 1M × 1536 builds in ~30 min and serves ~2,500 QPS at p50 ~6 ms. Full methodology and reproduce-it steps:
pgvector NVMe vs cloud SSD benchmarks.
- Always create an index for tables over a few thousand rows. Without an index, pgvector scans every row.
- Use HNSW as your default index type. It offers the best recall/speed trade-off.
- Increase
ef_search if recall is too low. Start with 100 and increase as needed — but on Solo’s pooled :6432 port, set it with SET LOCAL inside the query’s transaction or via ALTER DATABASE … SET (a plain session SET is dropped by transaction pooling). The same applies to ivfflat.probes.
- Batch inserts for loading large datasets. Use
COPY or multi-row INSERT for best throughput.
- Choose the right distance operator for your use case. Cosine (
<=>) is the most common for text embeddings.