Skip to main content

Overview

pgvector adds vector similarity search to PostgreSQL, letting you store embeddings from OpenAI, Cohere, Hugging Face, or any other model alongside your relational data. pgvector is enabled by default on the appdb database created with your cluster. For additional databases, add the pgvector extension through the dashboard UI or Terraform.

Verify the extension

Confirm pgvector is active on your database:
SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';

Store embeddings

Create a table with a vector column

CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    embedding VECTOR(1536)  -- OpenAI ada-002 dimensions
);
The VECTOR(n) type stores n-dimensional vectors. Common dimension sizes:
ModelDimensions
OpenAI text-embedding-ada-0021536
OpenAI text-embedding-3-small1536
OpenAI text-embedding-3-large3072
Cohere embed-english-v3.01024
BGE-small-en384

Insert vectors

INSERT INTO documents (content, embedding)
VALUES (
    'Rivestack is a managed PostgreSQL service',
    '[0.1, 0.2, 0.3, ...]'  -- your embedding vector
);

From Python with OpenAI

import psycopg2
import openai

# Generate embedding
response = openai.embeddings.create(
    model="text-embedding-ada-002",
    input="Rivestack is a managed PostgreSQL service"
)
embedding = response.data[0].embedding

# Store in Rivestack
conn = psycopg2.connect("postgresql://user:pass@host:5432/mydb?sslmode=require")
cur = conn.cursor()
cur.execute(
    "INSERT INTO documents (content, embedding) VALUES (%s, %s)",
    ("Rivestack is a managed PostgreSQL service", str(embedding))
)
conn.commit()

Query vectors

Find the 5 most similar documents using cosine distance:
SELECT id, content, embedding <=> '[0.1, 0.2, 0.3, ...]' AS distance
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'
LIMIT 5;

Distance operators

OperatorDistance MetricUse Case
<->L2 (Euclidean)General purpose
<=>CosineText embeddings (most common)
<#>Inner productWhen vectors are normalized

Create an index

For tables with more than a few thousand rows, create an index to speed up queries:
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
HNSW provides fast, approximate nearest-neighbor search with good recall. Tuning parameters:
ParameterDefaultDescription
m16Max connections per node. Higher = better recall, more memory.
ef_construction64Build-time search width. Higher = better recall, slower builds.
Set the search parameter at query time. Higher ef_search means better recall and higher latency (the default is 40):
-- Per query — works on every port, including the pooled :6432 (see warning):
BEGIN;
SET LOCAL hnsw.ef_search = 100;
SELECT id FROM documents ORDER BY embedding <=> $1 LIMIT 10;
COMMIT;

-- Or set a persistent default for the whole database (applies to new connections):
ALTER DATABASE appdb SET hnsw.ef_search = 100;
On Solo, a plain session SET hnsw.ef_search is silently ignored. Solo databases connect through PgBouncer in transaction-pooling mode on port 6432, so a bare SET hnsw.ef_search = 100; runs in its own transaction and is reset before your next query — you’ll quietly get the default (40) instead. Always either use SET LOCAL inside the same transaction as the query, or set a persistent default with ALTER DATABASE appdb SET hnsw.ef_search = … (or ALTER ROLE appdb SET …).HA, Growth, and Scale databases use the load balancer on port 5432 (not transaction pooling), so a session-level SET does persist there — but SET LOCAL / ALTER DATABASE is the portable approach that works everywhere.

IVFFlat index

CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
IVFFlat requires a training step and works best when you set lists to rows / 1000 for tables up to 1M rows.
SET ivfflat.probes = 10;  -- default is 1, higher = better recall

Which index to use?

HNSWIVFFlat
RecallHigherLower
Build speedSlowerFaster
Query speedFastFast
MemoryMoreLess
RecommendationDefault choiceLarge datasets with limited memory

RAG example

A complete retrieval-augmented generation (RAG) pattern:
import psycopg2
import openai

def search(query: str, top_k: int = 5):
    # 1. Generate embedding for the query
    response = openai.embeddings.create(
        model="text-embedding-ada-002",
        input=query
    )
    query_embedding = response.data[0].embedding

    # 2. Search Rivestack for similar documents
    conn = psycopg2.connect("postgresql://user:pass@host:5432/mydb?sslmode=require")
    cur = conn.cursor()
    cur.execute("""
        SELECT content, embedding <=> %s::vector AS distance
        FROM documents
        ORDER BY embedding <=> %s::vector
        LIMIT %s
    """, (str(query_embedding), str(query_embedding), top_k))

    results = cur.fetchall()
    conn.close()
    return results

# 3. Use results as context for your LLM
results = search("How does Rivestack handle backups?")
context = "\n".join([row[0] for row in results])

Benchmarks

Rivestack pgvector performance on one Starter node (2 vCPU, 4 GB RAM, local NVMe) with 250,000 vectors (1536 dimensions, HNSW m=16 / ef_construction=64, cosine), measured from a same-region app over PgBouncer (:6432) and TLS:
MetricResult
Throughput~1,000 QPS at recall@10 0.93
p50 latency3.7 ms (4 clients)
Scales to~1,570 QPS at p50 10 ms (16 clients)
Throughput ceiling~1,980 QPS at recall@10 0.75 (ef_search=10, 16 clients, p50 8 ms)
Network floor0.38 ms per same-region round-trip

Per-tier capacity and throughput

Measured across tiers (1536-dim, HNSW m=16 / ef_construction=64, recall@10 vs exact KNN). Each tier’s honest fast-search capacity is sized so the index fits hot in RAM:
TierRAMHot capacityThroughput (recall@10)
Solo / Starter4 GB~300K~1,000 QPS at 0.93, p50 3.7 ms (4 clients)
Growth8 GB~600K~1,716 QPS at 0.93 (16 clients)
Scale16 GB~1Mbuilds + serves 1M × 1536 at ~2,500 QPS, p50 ~6 ms (16 clients)
QPS and p50 latency trade off across the concurrency curve — they are not simultaneous best-cases, so each figure carries its recall and client count. recall@10 0.93 uses ef_search=80; recall@10 0.75 uses ef_search=10. Pooling and TLS add only ~0.4 ms over the raw same-region network floor (~0.38 ms on Solo via the :6432 PgBouncer; ~0.91 ms on Growth/Scale via the :5432 load balancer). A 1M × 1536 index is ~6 GB and does not fit or build on a 4 GB node — use a Scale (16 GB) node for datasets that large, where 1M × 1536 builds in ~30 min and serves ~2,500 QPS at p50 ~6 ms. Full methodology and reproduce-it steps: pgvector NVMe vs cloud SSD benchmarks.

Performance tips

  • Always create an index for tables over a few thousand rows. Without an index, pgvector scans every row.
  • Use HNSW as your default index type. It offers the best recall/speed trade-off.
  • Increase ef_search if recall is too low. Start with 100 and increase as needed — but on Solo’s pooled :6432 port, set it with SET LOCAL inside the query’s transaction or via ALTER DATABASE … SET (a plain session SET is dropped by transaction pooling). The same applies to ivfflat.probes.
  • Batch inserts for loading large datasets. Use COPY or multi-row INSERT for best throughput.
  • Choose the right distance operator for your use case. Cosine (<=>) is the most common for text embeddings.