Deploy OpenRAG with self-managed services

To manage your own OpenRAG services, deploy OpenRAG with Docker or Podman.

Use this installation method if you don't want to use the Terminal User Interface (TUI), or you need to run OpenRAG in an environment where using the TUI is unfeasible.

Prerequisites

For Microsoft Windows, you must use the Windows Subsystem for Linux (WSL). See Install OpenRAG on Windows before proceeding.

Install Python version 3.12 or later.

Install uv.

Install Podman or Docker.

The OpenRAG team recommends, at minimum, 8 GB of RAM for container VMs. However, if you plan to upload large files regularly, more RAM is recommended. For more information, see Troubleshoot OpenRAG.

A Docker or Podman VM must be running before you start OpenRAG.
Install podman-compose or Docker Compose. To use Docker Compose with Podman, you must alias Docker Compose commands to Podman commands.

Gather the credentials and connection details for one or more supported model providers:
- OpenAI: Create an OpenAI API key.
- Anthropic: Create an Anthropic API key. Anthropic provides language models only; you must select an additional provider for embeddings.
- IBM watsonx.ai: Get your watsonx.ai API endpoint, IBM project ID, and IBM API key from your watsonx deployment.
- Ollama: Deploy an Ollama instance and models locally, in the cloud, or on a remote server. Then, get your Ollama server's base URL and the names of the models that you want to use.
OpenRAG requires at least one language model and one embedding model. If a provider offers both types of models, then you can use the same provider for both models. If a provider offers only one type, then you must configure two providers.

Language models must support tool calling to be compatible with OpenRAG.

For more information, see Complete the application onboarding process.

Optional: Install GPU support with an NVIDIA GPU, CUDA support, and compatible NVIDIA drivers on the OpenRAG host machine. If you don't have GPU capabilities, OpenRAG provides an alternate CPU-only deployment that is suitable for most use cases. The default CPU-only deployment doesn't prevent you from using GPU acceleration in external services, such as Ollama servers.

Prepare your deployment

Clone the OpenRAG repository:

git clone https://github.com/langflow-ai/openrag.git

Change to the root of the cloned repository:
```
cd openrag
```
Install dependencies:
```
uv sync
```
Create a .env file at the root of the cloned repository.

You can create an empty file or copy the repository's .env.example file. The example file contains some of the OpenRAG environment variables to get you started with configuring your deployment.
```
cp .env.example .env
```
Edit the .env file to configure your deployment using OpenRAG environment variables. The OpenRAG Docker Compose files pull values from your .env file to configure the OpenRAG containers. The following variables are required or recommended:
- OPENSEARCH_PASSWORD (Required): Sets the OpenSearch administrator password. It must adhere to the OpenSearch password complexity requirements.
- LANGFLOW_SUPERUSER: The username for the Langflow administrator user. If LANGFLOW_SUPERUSER isn't set, then the default value is admin.
- LANGFLOW_SUPERUSER_PASSWORD (Strongly recommended): Sets the Langflow administrator password, and determines the Langflow server's default authentication mode. If LANGFLOW_SUPERUSER_PASSWORD isn't set, then the Langflow server starts without authentication enabled. For more information, see Langflow settings.
- LANGFLOW_SECRET_KEY (Strongly recommended): A secret encryption key for internal Langflow operations. It is recommended to generate your own Langflow secret key. If LANGFLOW_SECRET_KEY isn't set, then Langflow generates a secret key automatically.
- Model provider credentials: Provide credentials for your preferred model providers. If none of these are set in the .env file, you must configure at least one provider during the application onboarding process.
  - OPENAI_API_KEY
  - ANTHROPIC_API_KEY
  - OLLAMA_ENDPOINT
  - WATSONX_API_KEY
  - WATSONX_ENDPOINT
  - WATSONX_PROJECT_ID
To enable OAuth mode or cloud storage connectors, do the following:
1. Register OpenRAG as an OAuth application in your cloud provider, and then obtain the app's OAuth credentials, such as a client ID and secret key. To enable multiple connectors, you must register an app and generate credentials for each provider.
2. In your .env file, set the OAuth environment variables for the providers that you want to use:
```
GOOGLE_OAUTH_CLIENT_ID=
GOOGLE_OAUTH_CLIENT_SECRET=

MICROSOFT_GRAPH_OAUTH_CLIENT_ID=
MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET=

AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
```
  - Google: Enter your Google OAuth Client ID and Google OAuth Client Secret. You can generate these in the Google Cloud Console. For more information, see the Google OAuth client documentation.
    
    Providing these Google credentials enables OAuth mode and the Google Drive cloud storage connector.
    
    warning
    Google is the only supported OAuth provider for OpenRAG.
    You must enter Google credentials if you want to enable OAuth mode.
    The Microsoft and Amazon credentials are used only to authorize the cloud storage connectors. OpenRAG doesn't offer OAuth provider integrations for Microsoft or Amazon.
  - Microsoft: For the Microsoft OAuth Client ID and Microsoft OAuth Client Secret, enter Azure application registration credentials for SharePoint and OneDrive. For more information, see the Microsoft Graph OAuth client documentation.
  - Amazon: Enter your AWS Access Key ID and AWS Secret Access Key with access to your S3 instance. For more information, see the AWS documentation on Configuring access to AWS applications.
3. For each connector, you must register the OpenRAG redirect URIs in your OAuth apps:
  - Local deployments: http://localhost:3000/auth/callback
  - Production deployments: https://your-domain.com/auth/callback
  The redirect URIs are used for the cloud storage connector webhooks. For Google, the redirect URIs are also used to redirect users back to OpenRAG after they sign in.
4. Optional: Set the WEBHOOK_BASE_URL to the base address for your OAuth connector endpoints. If set, the OAuth connector webhook URLs are constructed as WEBHOOK_BASE_URL/connectors/${provider}/webhook. This option is required to enable automatic ingestion from cloud storage.
Optional: To enable the Langflow integration with Langfuse, set the following variables:
- LANGFUSE_SECRET_KEY: A secret key for your Langfuse project.
- LANGFUSE_PUBLIC_KEY: A public key for your Langfuse project.
- LANGFUSE_HOST: Required for self-hosted Langfuse deployments. Leave empty for Langfuse Cloud.
Save your .env file.

Start services

To use the default Docling Serve implementation, start docling serve on port 5001 on the host machine using the included script:
```
uv run python scripts/docling_ctl.py start --port 5001
```
Docling cannot run inside a Docker container due to system-level dependencies, so you must manage it as a separate service on the host machine. For more information, see Stop, start, and inspect native services.

Port 5001 is required to deploy OpenRAG successfully; don't use a different port. Additionally, this enables the MLX framework for accelerated performance on Apple Silicon Mac machines.

tip
If you don't want to use the default Docling Serve implementation, see Select a Docling implementation.
Confirm docling serve is running.

The following command checks the status of the default Docling Serve implementation:
```
uv run python scripts/docling_ctl.py status
```
If docling serve is running, the output includes the status, address, and process ID (PID):
```
Status: running
Endpoint: http://127.0.0.1:5001
Docs: http://127.0.0.1:5001/docs
PID: 27746
```
Deploy the OpenRAG containers locally using the appropriate Docker Compose configuration for your environment:
- CPU-only deployment (Default and recommended): If your host machine doesn't have NVIDIA GPU support, use the base docker-compose.yml file:
  Docker
```
docker compose up -d
```
  Podman
```
podman compose up -d
```
- GPU-accelerated deployment: If your host machine has an NVIDIA GPU with CUDA support and compatible NVIDIA drivers, use the base docker-compose.yml file with the docker-compose.gpu.yml override:
  Docker
```
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
```
  Podman
```
podman compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
```
tip
GPU acceleration isn't required for most use cases. OpenRAG's CPU-only deployment doesn't prevent you from using GPU acceleration in external services, such as Ollama servers.

GPU acceleration is required only for specific use cases, typically involving customization of the ingestion flows or ingestion logic. For example, writing alternate ingest logic in OpenRAG that uses GPUs directly in the container, or customizing the ingestion flows to use Langflow's Docling component with GPU acceleration instead of OpenRAG's Docling Serve service.

Wait for the OpenRAG containers to start, and then confirm that all containers are running:

Docker
docker compose ps

Podman
podman compose ps

The OpenRAG Docker Compose files deploy the following containers:

Container Name	Default address	Purpose
OpenRAG Backend	http://localhost:8000	FastAPI server and core functionality.
OpenRAG Frontend	http://localhost:3000	React web interface for user interaction.
Langflow	http://localhost:7860	AI workflow engine.
OpenSearch	http://localhost:9200	Datastore for knowledge.
OpenSearch Dashboards	http://localhost:5601	OpenSearch database administration interface.

When the containers are running, you can access your OpenRAG services at their addresses.

Access the OpenRAG frontend at http://localhost:3000, and then continue with the application onboarding process.

If you provided Google OAuth credentials, you must sign in with Google before you are redirected to your OpenRAG instance.

Complete the application onboarding process

The first time you start the OpenRAG application, you must complete the application onboarding process to select language and embedding models that are essential for OpenRAG features like the Chat.

To complete onboarding, you must configure at least one language model and one embedding model.

You can use different providers for your language model and embedding model, such as Anthropic for the language model and OpenAI for the embedding model. Additionally, you can select multiple embedding models.

Anthropic
IBM watsonx.ai
Ollama
OpenAI (default)

info

Anthropic doesn't provide embedding models. If you select Anthropic for your language model, you must select a different provider for the embedding model.

Enter your Anthropic API key, or enable Use environment API key to pull the key from your OpenRAG .env file.
Under Advanced settings, select the language model that you want to use.

Language models must support tool calling to be compatible with OpenRAG. Incompatible models aren't listed.
Click Complete.
Select a provider for embeddings, provide the required information, and then select the embedding model you want to use. For information about another provider's credentials and settings, see the instructions for that provider.
Click Complete.

After you configure the embedding model, OpenRAG uses your credentials and models to ingest some initial documents. This tests the connection, and it allows you to ask OpenRAG about itself in the Chat. If there is a problem with the model configuration, an error occurs and you are redirected back to the application onboarding screen. Verify that the credential is valid and has access to the selected model, and then click Complete to retry ingestion.
Continue through the overview slides for a brief introduction to OpenRAG, or click Skip overview. The overview demonstrates some basic functionality that is covered in the quickstart and in other parts of the OpenRAG documentation.

info

OpenRAG isn't guaranteed to be compatible with all models that are available through IBM watsonx.ai.

Language models must support tool calling to be compatible with OpenRAG. Incompatible models aren't listed in OpenRAG's settings or onboarding.

Additionally, models must be able to handle the agentic reasoning tasks required by OpenRAG. Models that are too small or not designed for agentic RAG tasks can return low quality, incorrect, or improperly formatted responses. For more information, see Chat issues.

You can submit an OpenRAG GitHub issue to request support for specific models.

For watsonx.ai API Endpoint, select the base URL for your watsonx.ai model deployment.
Enter your watsonx.ai deployment's project ID and API key.

You can enable Use environment API key to pull the key from your OpenRAG .env file.
Under Advanced settings, select the language model that you want to use.

Language models must support tool calling to be compatible with OpenRAG. Incompatible models aren't listed.
Click Complete.
Select a provider for embeddings, provide the required information, and then select the embedding model you want to use. For information about another provider's credentials and settings, see the instructions for that provider.
Click Complete.

After you configure the embedding model, OpenRAG uses your credentials and models to ingest some initial documents. This tests the connection, and it allows you to ask OpenRAG about itself in the Chat. If there is a problem with the model configuration, an error occurs and you are redirected back to the application onboarding screen. Verify that the credentials are valid and have access to the selected model, and then click Complete to retry ingestion.
Continue through the overview slides for a brief introduction to OpenRAG, or click Skip overview. The overview demonstrates some basic functionality that is covered in the quickstart and in other parts of the OpenRAG documentation.

Using Ollama as your language and embedding model provider offers greater flexibility and configuration options for hosting models. However, it requires additional setup because Ollama isn't included with OpenRAG. You must deploy Ollama separately if you want to use Ollama as a model provider.

info

OpenRAG isn't guaranteed to be compatible with all models that are available through Ollama. Some models might produce unexpected results, such as JSON-formatted output instead of natural language responses, and some models aren't appropriate for the types of tasks that OpenRAG performs, such as those that generate media.

Language models: Ollama-hosted language models must support tool calling to be compatible with OpenRAG. The OpenRAG team recommends gpt-oss:20b or mistral-nemo:12b. If you choose gpt-oss:20b, consider using Ollama Cloud or running Ollama on a remote machine because this model requires at least 16GB of RAM.
Embedding models: The OpenRAG team recommends nomic-embed-text:latest, mxbai-embed-large:latest, or embeddinggemma:latest.

You can experiment with other models, but if you encounter issues that you are unable to resolve through other RAG best practices (like context filters and prompt engineering), try switching to one of the recommended models. You can submit an OpenRAG GitHub issue to request support for specific models.

Install Ollama locally or on a remote server, or run models in Ollama Cloud.

If you are running a remote server, it must be accessible from your OpenRAG deployment.
In the OpenRAG onboarding dialog, enter your Ollama server's base URL:
- Local Ollama server: Enter your Ollama server's base URL and port. The default Ollama server address is http://localhost:11434.
- Ollama Cloud: Because Ollama Cloud models run at the same address as a local Ollama server and automatically offload to Ollama's cloud service, you can use the same base URL and port as you would for a local Ollama server. The default address is http://localhost:11434.
- Remote server: Enter your remote Ollama server's base URL and port, such as http://your-remote-server:11434.
Select a language model that your Ollama server is running.

OpenRAG only lists language models that support tool calling. If your server isn't running any compatible language models, you must either deploy a compatible language model on your Ollama server, or use another provider for the language model.

Language model and embedding model selections are independent. You can use the same or different servers for each model.

To use different providers for each model, you must configure both providers, and select the relevant model for each provider.
Click Complete.
Select a provider for embeddings, provide the required information, and then select the embedding model you want to use. For information about another provider's credentials and settings, see the instructions for that provider.
Click Complete.

After you configure the embedding model, OpenRAG uses your credentials and models to ingest some initial documents. This tests the connection, and it allows you to ask OpenRAG about itself in the Chat. If there is a problem with the model configuration, an error occurs and you are redirected back to the application onboarding screen. Verify that the server address is valid, and that the selected model is running on the server. Then, click Complete to retry ingestion.
Continue through the overview slides for a brief introduction to OpenRAG, or click Skip overview. The overview demonstrates some basic functionality that is covered in the quickstart and in other parts of the OpenRAG documentation.

Enter your OpenAI API key, or enable Use environment API key to pull the key from your OpenRAG .env file.
Under Advanced settings, select the language model that you want to use.

Language models must support tool calling to be compatible with OpenRAG. Incompatible models aren't listed.
Click Complete.
Select a provider for embeddings, provide the required information, and then select the embedding model you want to use. For information about another provider's credentials and settings, see the instructions for that provider.
Click Complete.

After you configure the embedding model, OpenRAG uses your credentials and models to ingest some initial documents. This tests the connection, and it allows you to ask OpenRAG about itself in the Chat. If there is a problem with the model configuration, an error occurs and you are redirected back to the application onboarding screen. Verify that the credential is valid and has access to the selected model, and then click Complete to retry ingestion.
Continue through the overview slides for a brief introduction to OpenRAG, or click Skip overview. The overview demonstrates some basic functionality that is covered in the quickstart and in other parts of the OpenRAG documentation.

Next steps

Try some of OpenRAG's core features in the quickstart.
Learn how to manage OpenRAG services.
Upload documents, and then use the Chat to explore your data.

Prerequisites​

Prepare your deployment​

Start services​

Complete the application onboarding process​

Next steps​

Prerequisites

Prepare your deployment

Start services

Complete the application onboarding process

Next steps