How to set up Qdrant vector database for RAG applications
Install Qdrant locally or on a server to build a Retrieval-Augmented Generation pipeline. This guide covers Docker setup, collection creation, and embedding integration for Python.
Install Qdrant to store high-dimensional vectors for retrieval tasks in a RAG system. The steps target Ubuntu 24.04 or Debian 12 using Docker and the official Qdrant image. You will configure a collection, load embeddings, and query vectors with a Python client.
Prerequisites
- Ubuntu 24.04 or Debian 12 with Docker Engine 24.0.5 or newer installed.
- At least 4 GB RAM available for the Qdrant container and Python runtime.
- A Python 3.10+ environment with pip, curl, and git installed.
- Network access to pull the qdrant/qdrant Docker image (requires internet connection).
Step 1: Pull the Qdrant Docker image
Download the official Qdrant image from the Docker Hub registry. This image includes the Qdrant server and all necessary dependencies for running vector operations.
docker pull qdrant/qdrant:latest
Verify the pull completed successfully by checking the image list.
docker images | grep qdrant
Expected output shows the image name, ID, and size. The size varies but typically exceeds 500 MB.
REPOSITORY TAG IMAGE ID CREATED SIZE
qdrant latest abc123def456 2 hours ago 602MB
Step 2: Start the Qdrant container
Run the Qdrant server in detached mode with a host port mapping. Map port 6333 from the container to the host so you can access the REST API. Allocate 2 GB of memory to the container to ensure stable performance.
docker run -d \
--name qdrant \
-p 6333:6333 \
-m 2g \
-e QDRANT__MEMORY__MAPPED_MEMORY_SIZE=2GB \
qdrant/qdrant:latest
Check the container status to ensure it is running.
docker ps | grep qdrant
Expected output confirms the container name is qdrant and the status is Up.
CONTAINER ID NAME STATUS PORTS
abc123def456 qdrant Up (healthy) 0.0.0.0:6333->6333/tcp
Step 3: Create a collection for embeddings
Use the Qdrant REST API to create a new collection. Define the vector size as 768 for a standard BERT-based embedding model. Set the metric type to Cosine for semantic similarity searches.
curl -X PUT "http://localhost:6333/collections/my_rag_collection" \
-H "Content-Type: application/json" \
-d '{
"vectors": {
"size": 768,
"distance": "Cosine"
},
"shard_number": 1,
"replication_factor": 1
}'
Expected output returns a 200 OK status with the collection details.
{
"result": true,
"collection_name": "my_rag_collection",
"vectors_config": {
"size": 768,
"distance": "Cosine"
}
}
Step 4: Install the Python client
Install the official Qdrant Python client package using pip. This package provides a high-level interface for managing collections and performing queries.
pip install qdrant-client
Verify the installation by importing the client in a Python shell.
python -c "from qdrant_client import QdrantClient; print('Client imported successfully')"
Expected output prints the success message without errors.
Client imported successfully
Step 5: Load sample vectors into the collection
Create a Python script to insert sample vectors into the collection. Use a list of dictionaries where each item contains a text payload and a vector. The vector is a list of 768 floats representing the embedding.
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
client = QdrantClient(host="localhost", port=6333)
points = [
{"id": 1, "vector": [0.1] * 768, "payload": {"text": "What is machine learning?"}},
{"id": 2, "vector": [0.2] * 768, "payload": {"text": "How to build a RAG system?"}},
{"id": 3, "vector": [0.3] * 768, "payload": {"text": "Explain vector databases"}}
]
client.upsert(collection_name="my_rag_collection", points=points)
print("Vectors inserted successfully")
Expected output confirms the upsert operation completed.
Vectors inserted successfully
Verify the installation
Run a similarity search query to confirm the collection is functional. Query for the vector closest to the first sample point and check the returned payload.
from qdrant_client.models import PointStruct
query_vector = [0.1] * 768
results = client.search(
collection_name="my_rag_collection",
query_vector=query_vector,
limit=1
)
print(f"Top result: {results[0].payload['text']}")
Expected output displays the matching text payload.
Top result: What is machine learning?
Troubleshooting
Check container logs for errors if the container fails to start. Use the docker logs command to inspect the last 50 lines of output.
docker logs qdrant --tail 50
Look for messages indicating port conflicts or memory allocation failures. If port 6333 is already in use, change the host port mapping in the docker run command.
Resolve memory issues by increasing the -m flag or reducing the number of vectors in the collection. Ensure the Docker daemon has sufficient swap space enabled if the system runs low on RAM.
Verify network connectivity by testing the REST API directly with curl. A 503 Service Unavailable error indicates the container is overloaded or the shard count is too high for the allocated memory.
Recreate the container if the data is corrupted or the configuration is invalid. Use docker rm -f qdrant followed by docker run to reset the service state.
Ensure the Python client version matches the Qdrant server version to avoid compatibility errors. Upgrade the client package if the upsert or search methods return AttributeError.
Validate vector dimensions match the collection definition. Inserting a vector with a size different from the configured size returns a BadRequest error.