Skip to main content

Deploy OpenRAG with self-managed services

To manage your own OpenRAG services, deploy OpenRAG with Docker or Podman.

Use this installation method if you don't want to use the Terminal User Interface (TUI), or you need to run OpenRAG in an environment where using the TUI is unfeasible.

Prerequisites

  • Install Python version 3.13 or later.
  • Install uv.

  • Install Podman (recommended) or Docker.

    The OpenRAG team recommends, at minimum, 8 GB of RAM for container VMs. However, if you plan to upload large files regularly, more RAM is recommended. For more information, see Troubleshoot OpenRAG.

  • Install podman-compose or Docker Compose. To use Docker Compose with Podman, you must alias Docker Compose commands to Podman commands.

  • Gather the credentials and connection details for your preferred model providers. You must have access to at least one language model and one embedding model. If a provider offers both types, you can use the same provider for both models. If a provider offers only one type, you must select two providers.

    • OpenAI: Create an OpenAI API key.

    • Anthropic: Create an Anthropic API key. Anthropic provides language models only; you must select an additional provider for embeddings.

    • IBM watsonx.ai: Get your watsonx.ai API endpoint, IBM project ID, and IBM API key from your watsonx deployment.

    • Ollama: Deploy an Ollama instance and models locally, in the cloud, or on a remote server. Then, get your Ollama server's base URL and the names of the models that you want to use.

      info

      OpenRAG isn't guaranteed to be compatible with all models that are available through Ollama. For example, some models might produce unexpected results, such as JSON-formatted output instead of natural language responses, and some models aren't appropriate for the types of tasks that OpenRAG performs, such as those that generate media.

      The OpenRAG team recommends the following models when using Ollama as your model provider:

      • Language models: gpt-oss:20b or mistral-nemo:12b.

        If you choose gpt-oss:20b, consider using Ollama Cloud or running Ollama on a remote machine because this model requires at least 16GB of RAM.

      • Embedding models: nomic-embed-text:latest, mxbai-embed-large:latest, or embeddinggemma:latest.

      You can experiment with other models, but if you encounter issues that you are unable to resolve through other RAG best practices (like context filters and prompt engineering), try switching to one of the recommended models. You can submit an OpenRAG GitHub issue to request support for specific models.

  • Optional: Install GPU support with an NVIDIA GPU, CUDA support, and compatible NVIDIA drivers on the OpenRAG host machine. If you don't have GPU capabilities, OpenRAG provides an alternate CPU-only deployment.

Prepare your deployment

  1. Clone the OpenRAG repository:

    git clone https://github.com/langflow-ai/openrag.git
  2. Change to the root of the cloned repository:

    cd openrag
  3. Install dependencies:

    uv sync
  4. Create a .env file at the root of the cloned repository.

    You can create an empty file or copy the repository's .env.example file. The example file contains some of the OpenRAG environment variables to get you started with configuring your deployment.

    cp .env.example .env
  5. Edit the .env file to configure your deployment using OpenRAG environment variables. The OpenRAG Docker Compose files pull values from your .env file to configure the OpenRAG containers. The following variables are required or recommended:

    • OPENSEARCH_PASSWORD (Required): Sets the OpenSearch administrator password. It must adhere to the OpenSearch password complexity requirements.

    • LANGFLOW_SUPERUSER: The username for the Langflow administrator user. If LANGFLOW_SUPERUSER isn't set, then the default value is admin.

    • LANGFLOW_SUPERUSER_PASSWORD (Strongly recommended): Sets the Langflow administrator password, and determines the Langflow server's default authentication mode. If LANGFLOW_SUPERUSER_PASSWORD isn't set, then the Langflow server starts without authentication enabled. For more information, see Langflow settings.

    • LANGFLOW_SECRET_KEY (Strongly recommended): A secret encryption key for internal Langflow operations. It is recommended to generate your own Langflow secret key. If LANGFLOW_SECRET_KEY isn't set, then Langflow generates a secret key automatically.

    • Model provider credentials: Provide credentials for your preferred model providers. If none of these are set in the .env file, you must configure at least one provider during the application onboarding process.

      • OPENAI_API_KEY
      • ANTHROPIC_API_KEY
      • OLLAMA_ENDPOINT
      • WATSONX_API_KEY
      • WATSONX_ENDPOINT
      • WATSONX_PROJECT_ID
    • OAuth provider credentials: To upload documents from external storage, such as Google Drive, set the required OAuth credentials for the connectors that you want to use. You can manage OAuth credentials later, but it is recommended to configure them during initial set up so you don't have to rebuild the containers.

    For more information and variables, see OpenRAG environment variables.

Start services

  1. Start docling serve on port 5001 on the host machine:

    uv run python scripts/docling_ctl.py start --port 5001

    Docling cannot run inside a Docker container due to system-level dependencies, so you must manage it as a separate service on the host machine. For more information, see Stop, start, and inspect native services.

    This port is required to deploy OpenRAG successfully; don't use a different port. Additionally, this enables the MLX framework for accelerated performance on Apple Silicon Mac machines.

  2. Confirm docling serve is running.

    uv run python scripts/docling_ctl.py status

    If docling serve is running, the output includes the status, address, and process ID (PID):

    Status: running
    Endpoint: http://127.0.0.1:5001
    Docs: http://127.0.0.1:5001/docs
    PID: 27746
  3. Deploy the OpenRAG containers locally using the appropriate Docker Compose configuration for your environment:

    • GPU-accelerated deployment: If your host machine has an NVIDIA GPU with CUDA support and compatible NVIDIA drivers, use the base docker-compose.yml file with the docker-compose.gpu.yml override.

      Docker
      docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
      Podman
      podman compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
    • CPU-only deployment (default): If your host machine doesn't have NVIDIA GPU support, use the base docker-compose.yml file.

      Docker
      docker compose up -d
      Podman
      podman compose up -d
  4. Wait for the OpenRAG containers to start, and then confirm that all containers are running:

    Docker
    docker compose ps
    Podman
    podman compose ps

    The OpenRAG Docker Compose files deploy the following containers:

    Container NameDefault addressPurpose
    OpenRAG Backendhttp://localhost:8000FastAPI server and core functionality.
    OpenRAG Frontendhttp://localhost:3000React web interface for user interaction.
    Langflowhttp://localhost:7860AI workflow engine.
    OpenSearchhttp://localhost:9200Datastore for knowledge.
    OpenSearch Dashboardshttp://localhost:5601OpenSearch database administration interface.

    When the containers are running, you can access your OpenRAG services at their addresses.

  5. Access the OpenRAG frontend at http://localhost:3000, and then continue with the application onboarding process.

Complete the application onboarding process

The first time you start the OpenRAG application, you must complete the application onboarding process to select language and embedding models that are essential for OpenRAG features like the Chat.

Some of these variables, such as the embedding models, can be changed seamlessly after onboarding. Others are immutable and require you to destroy and recreate the OpenRAG containers. For more information, see the OpenRAG environment variables reference.

You can use different providers for your language model and embedding model, such as Anthropic for the language model and OpenAI for the embedding model. Additionally, you can set multiple embedding models.

You only need to complete onboarding for your preferred providers.

info

Anthropic doesn't provide embedding models. If you select Anthropic for your language model, you must select a different provider for the embedding model.

  1. Enter your Anthropic API key, or enable Use environment API key to pull the key from your OpenRAG .env file.

  2. Under Advanced settings, select the language model that you want to use.

  3. Click Complete.

  4. Select a provider for embeddings, provide the required information, and then select the embedding model you want to use. For information about another provider's credentials and settings, see the instructions for that provider.

  5. Click Complete.

    After you configure the embedding model, OpenRAG uses your credentials and models to ingest some initial documents. This tests the connection, and it allows you to ask OpenRAG about itself in the Chat. If there is a problem with the model configuration, an error occurs and you are redirected back to the application onboarding screen. Verify that the credential is valid and has access to the selected model, and then click Complete to retry ingestion.

  6. Continue through the overview slides for a brief introduction to OpenRAG, or click Skip overview. The overview demonstrates some basic functionality that is covered in the quickstart and in other parts of the OpenRAG documentation.

Next steps