AI & Machine Learning Apr 15, 2026 60 views 4 min read

How to set up a RAG pipeline with LangChain and Weaviate

Build a retrieval-augmented generation system using Python, LangChain, and Weaviate vector database. Install dependencies, configure the vector store, and run a query test.

Arjun M.

Updated 4d ago

Cloud VPS — scale in minutes

Instantly deploy SSD cloud VPS with guaranteed resources, snapshots and per-hour billing. Pay only for what you use.

Build a retrieval-augmented generation system that fetches relevant data from a vector store and sends it to an LLM. These steps target Python 3.10+, LangChain 0.2+, Weaviate 1.24+, and an active Weaviate cloud instance or local Docker container.

Prerequisites

Ubuntu 24.04 LTS or macOS 14+
Python 3.10 or newer installed via system package manager or pyenv
Git installed (optional, for cloning repositories)
Access to a Weaviate cloud instance or a running local Docker container with Weaviate 1.24+
A valid API key for Weaviate cloud or Docker access to the local instance
At least 4 GB of available RAM for local Weaviate containers

Step 1: Install Python dependencies

Create a virtual environment to isolate project dependencies and activate it before installing packages. This prevents conflicts with system Python packages and ensures a clean environment for the RAG pipeline.

python3 -m venv rag-env
source rag-env/bin/activate

Install the required Python packages using pip. This includes LangChain, Weaviate client, and the embedding model provider.

pip install langchain langchain-community langchain-core weaviate python-dotenv sentence-transformers

Verify the installation by checking the version of the Weaviate client library.

python -c "import weaviate; print(weaviate.__version__)"

1.24.0

Step 2: Configure environment variables

Create a .env file to store sensitive configuration data like API keys and connection strings. Do not commit this file to version control to protect your credentials.

WEAVIATE_API_KEY=your_api_key_here
WEAVIATE_URL=https://cloud.weaviate.net

For local setups, update the URL to your Docker host address, typically http://localhost:8080.

WEAVIATE_URL=http://localhost:8080

Step 3: Create the vector store class

Initialize a Weaviate client using your credentials and connect to the specified URL. Create a new class named 'documents' to store your text chunks and their embeddings.

from langchain_community.vectorstores import Weaviate
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.document_loaders import TextLoader

import os
from dotenv import load_dotenv

load_dotenv()

client = Weaviate(
    url=os.getenv("WEAVIATE_URL"),
    api_key=os.getenv("WEAVIATE_API_KEY"),
    additional_headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")}
)

client.create_collection(name="documents", properties=[
    {"name": "content", "dataType": "text"},
    {"name": "source", "dataType": "text"}
])

This command creates the schema in the database, defining the fields you will use to store your data.

Step 4: Load and split the data

Load a text file into memory and split it into smaller chunks for better retrieval accuracy. Use a character text splitter to break the document into manageable pieces, typically around 500 to 1000 tokens each.

loader = TextLoader("data/sample.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
splits = text_splitter.split_documents(documents)

Print the number of splits to confirm the data was processed correctly.

print(f"Number of splits: {len(splits)}")

Number of splits: 12

Step 5: Add data to the vector store

Insert the split documents into the Weaviate vector store. This operation calculates embeddings for each chunk and stores them in the database for fast retrieval.

client.add_documents(documents=splits)

Wait for the insertion to complete. The system will return a success message once all documents are indexed.

Step 6: Build the retrieval chain

Create a LangChain retrieval chain that fetches relevant chunks from the vector store based on a user query. Configure the retriever to return the top 3 most relevant documents for the LLM to process.

from langchain.chains import RetrievalQA
from langchain_community.llms import OpenAI

llm = OpenAI(temperature=0)
retriever = client.as_retriever(search_kwargs={"k": 3})
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

This chain combines the retriever and the LLM into a single object you can call with questions.

Verify the installation

Test the RAG pipeline by asking a question that requires retrieving information from your uploaded documents. The system should return the answer along with the source documents used to generate it.

response = qa_chain.invoke("What is the main topic of the sample text?")
print(response["result"])

The main topic of the sample text is [inserted summary based on content].

Check the response time and ensure the retrieved documents are relevant to the query. A typical response time should be under 2 seconds for small datasets.

Troubleshooting

Check the Weaviate logs for errors if the connection fails or queries return empty results. Ensure your API key is valid and the URL is correctly formatted without trailing slashes.

curl -X GET "https://cloud.weaviate.net/v1/.well-known/health" \
  -H "Authorization: Bearer YOUR_API_KEY"

If the health check fails, verify your network connectivity and firewall settings. For local Docker setups, ensure port 8080 is not blocked.

docker ps | grep weaviate

Restart the container if it is not running.

docker restart weaviate

Re-run the health check to confirm the service is back online. If the vector store returns no results, check that the class name matches the one defined in the code and that the documents were successfully added.

client.query(
    class_name="documents",
    query_text="What is the main topic?",
    top_k=3
)

Ensure your embedding model matches the one used during the initial data loading step. Mismatched models will produce incompatible embeddings and fail to retrieve relevant documents.

Powerful Dedicated Servers — Linux & Windows

Bare-metal performance with SSD storage, DDoS protection and 24/7 expert support. Ideal for production workloads, databases and high-traffic sites.

Tags: RAGLangChainPythonLLMWeaviate

Was this helpful?