> ## Documentation Index
> Fetch the complete documentation index at: https://docs.rivestack.io/llms.txt
> Use this file to discover all available pages before exploring further.

# pgvector Setup

> Store and query vector embeddings for AI applications

## Overview

pgvector adds vector similarity search to PostgreSQL, letting you store embeddings from OpenAI, Cohere, Hugging Face, or any other model alongside your relational data.

pgvector is enabled by default on the `appdb` database created with your cluster. For additional databases, add the pgvector extension through the dashboard UI or Terraform.

## Verify the extension

Confirm pgvector is active on your database:

```sql theme={null}
SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';
```

## Store embeddings

### Create a table with a vector column

```sql theme={null}
CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    embedding VECTOR(1536)  -- OpenAI ada-002 dimensions
);
```

The `VECTOR(n)` type stores n-dimensional vectors. Common dimension sizes:

| Model                         | Dimensions |
| ----------------------------- | ---------- |
| OpenAI text-embedding-ada-002 | 1536       |
| OpenAI text-embedding-3-small | 1536       |
| OpenAI text-embedding-3-large | 3072       |
| Cohere embed-english-v3.0     | 1024       |
| BGE-small-en                  | 384        |

### Insert vectors

```sql theme={null}
INSERT INTO documents (content, embedding)
VALUES (
    'Rivestack is a managed PostgreSQL service',
    '[0.1, 0.2, 0.3, ...]'  -- your embedding vector
);
```

### From Python with OpenAI

```python theme={null}
import psycopg2
import openai

# Generate embedding
response = openai.embeddings.create(
    model="text-embedding-ada-002",
    input="Rivestack is a managed PostgreSQL service"
)
embedding = response.data[0].embedding

# Store in Rivestack
conn = psycopg2.connect("postgresql://user:pass@host:5432/mydb?sslmode=require")
cur = conn.cursor()
cur.execute(
    "INSERT INTO documents (content, embedding) VALUES (%s, %s)",
    ("Rivestack is a managed PostgreSQL service", str(embedding))
)
conn.commit()
```

## Query vectors

### Similarity search

Find the 5 most similar documents using cosine distance:

```sql theme={null}
SELECT id, content, embedding <=> '[0.1, 0.2, 0.3, ...]' AS distance
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'
LIMIT 5;
```

### Distance operators

| Operator | Distance Metric | Use Case                      |
| -------- | --------------- | ----------------------------- |
| `<->`    | L2 (Euclidean)  | General purpose               |
| `<=>`    | Cosine          | Text embeddings (most common) |
| `<#>`    | Inner product   | When vectors are normalized   |

## Create an index

For tables with more than a few thousand rows, create an index to speed up queries:

### HNSW index (recommended)

```sql theme={null}
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
```

HNSW provides fast, approximate nearest-neighbor search with good recall. Tuning parameters:

| Parameter         | Default | Description                                                     |
| ----------------- | ------- | --------------------------------------------------------------- |
| `m`               | 16      | Max connections per node. Higher = better recall, more memory.  |
| `ef_construction` | 64      | Build-time search width. Higher = better recall, slower builds. |

Set the search parameter at query time. Higher `ef_search` means better recall and higher latency (the default is `40`):

```sql theme={null}
-- Per query — works on every port, including the pooled :6432 (see warning):
BEGIN;
SET LOCAL hnsw.ef_search = 100;
SELECT id FROM documents ORDER BY embedding <=> $1 LIMIT 10;
COMMIT;

-- Or set a persistent default for the whole database (applies to new connections):
ALTER DATABASE appdb SET hnsw.ef_search = 100;
```

<Warning>
  **On Solo, a plain session `SET hnsw.ef_search` is silently ignored.** Solo databases connect through PgBouncer in **transaction-pooling** mode on port `6432`, so a bare `SET hnsw.ef_search = 100;` runs in its own transaction and is reset before your next query — you'll quietly get the default (`40`) instead. Always either use `SET LOCAL` *inside the same transaction* as the query, or set a persistent default with `ALTER DATABASE appdb SET hnsw.ef_search = …` (or `ALTER ROLE appdb SET …`).

  HA, Growth, and Scale databases use the load balancer on port `5432` (not transaction pooling), so a session-level `SET` does persist there — but `SET LOCAL` / `ALTER DATABASE` is the portable approach that works everywhere.
</Warning>

### IVFFlat index

```sql theme={null}
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
```

IVFFlat requires a training step and works best when you set `lists` to `rows / 1000` for tables up to 1M rows.

```sql theme={null}
SET ivfflat.probes = 10;  -- default is 1, higher = better recall
```

### Which index to use?

|                | HNSW           | IVFFlat                            |
| -------------- | -------------- | ---------------------------------- |
| Recall         | Higher         | Lower                              |
| Build speed    | Slower         | Faster                             |
| Query speed    | Fast           | Fast                               |
| Memory         | More           | Less                               |
| Recommendation | Default choice | Large datasets with limited memory |

## RAG example

A complete retrieval-augmented generation (RAG) pattern:

```python theme={null}
import psycopg2
import openai

def search(query: str, top_k: int = 5):
    # 1. Generate embedding for the query
    response = openai.embeddings.create(
        model="text-embedding-ada-002",
        input=query
    )
    query_embedding = response.data[0].embedding

    # 2. Search Rivestack for similar documents
    conn = psycopg2.connect("postgresql://user:pass@host:5432/mydb?sslmode=require")
    cur = conn.cursor()
    cur.execute("""
        SELECT content, embedding <=> %s::vector AS distance
        FROM documents
        ORDER BY embedding <=> %s::vector
        LIMIT %s
    """, (str(query_embedding), str(query_embedding), top_k))

    results = cur.fetchall()
    conn.close()
    return results

# 3. Use results as context for your LLM
results = search("How does Rivestack handle backups?")
context = "\n".join([row[0] for row in results])
```

## Benchmarks

Rivestack pgvector performance on one **Starter** node (2 vCPU, 4 GB RAM, local NVMe) with 250,000 vectors (1536 dimensions, HNSW `m=16` / `ef_construction=64`, cosine), measured from a same-region app over PgBouncer (`:6432`) and TLS:

| Metric                 | Result                                                               |
| ---------------------- | -------------------------------------------------------------------- |
| **Throughput**         | \~1,000 QPS at recall\@10 0.93                                       |
| **p50 latency**        | 3.7 ms (4 clients)                                                   |
| **Scales to**          | \~1,570 QPS at p50 10 ms (16 clients)                                |
| **Throughput ceiling** | \~1,980 QPS at recall\@10 0.75 (ef\_search=10, 16 clients, p50 8 ms) |
| **Network floor**      | 0.38 ms per same-region round-trip                                   |

### Per-tier capacity and throughput

Measured across tiers (1536-dim, HNSW `m=16` / `ef_construction=64`, recall\@10 vs exact KNN). Each tier's honest fast-search capacity is sized so the index fits hot in RAM:

| Tier               | RAM   | Hot capacity | Throughput (recall\@10)                                           |
| ------------------ | ----- | ------------ | ----------------------------------------------------------------- |
| **Solo / Starter** | 4 GB  | \~300K       | \~1,000 QPS at 0.93, p50 3.7 ms (4 clients)                       |
| **Growth**         | 8 GB  | \~600K       | \~1,716 QPS at 0.93 (16 clients)                                  |
| **Scale**          | 16 GB | \~1M         | builds + serves 1M × 1536 at \~2,500 QPS, p50 \~6 ms (16 clients) |

<Info>QPS and p50 latency trade off across the concurrency curve — they are not simultaneous best-cases, so each figure carries its recall and client count. recall\@10 0.93 uses `ef_search=80`; recall\@10 0.75 uses `ef_search=10`. Pooling and TLS add only \~0.4 ms over the raw same-region network floor (\~0.38 ms on Solo via the `:6432` PgBouncer; \~0.91 ms on Growth/Scale via the `:5432` load balancer). A 1M × 1536 index is \~6 GB and does not fit or build on a 4 GB node — use a Scale (16 GB) node for datasets that large, where 1M × 1536 builds in \~30 min and serves \~2,500 QPS at p50 \~6 ms. Full methodology and reproduce-it steps: [pgvector NVMe vs cloud SSD benchmarks](https://rivestack.io/blog/pgvector-performance-nvme-vs-cloud-ssd-benchmarks).</Info>

## Performance tips

* **Always create an index** for tables over a few thousand rows. Without an index, pgvector scans every row.
* **Use HNSW** as your default index type. It offers the best recall/speed trade-off.
* **Increase `ef_search`** if recall is too low. Start with 100 and increase as needed — but on Solo's pooled `:6432` port, set it with `SET LOCAL` inside the query's transaction or via `ALTER DATABASE … SET` (a plain session `SET` is dropped by transaction pooling). The same applies to `ivfflat.probes`.
* **Batch inserts** for loading large datasets. Use `COPY` or multi-row `INSERT` for best throughput.
* **Choose the right distance operator** for your use case. Cosine (`<=>`) is the most common for text embeddings.
