Complete self-hosted AI stack: Ollama + Open WebUI + MinIO + Caddy

You can have a private ChatGPT alternative, with models that never leave your network, on a server you already own. It takes four open-source components and a single docker-compose.yml. Everything runs comfortably on a machine with 16 GB of RAM and a 50 GB SSD.

The four components

Ollama is the engine that downloads and runs LLM models locally. It exposes an OpenAI-compatible REST API on localhost:11434. It supports Qwen2.5, Llama 3.3, Mistral, Gemma 3, and dozens of others.

Open WebUI is the chat interface that connects to Ollama. It has the same UX as ChatGPT, handles multiple conversations, supports document uploads, and integrates with S3-compatible storage (so with MinIO).

MinIO is self-hosted S3-compatible object storage. You use it to upload company documents and run Q&A on them via RAG, without files ever leaving the server.

Caddy acts as a reverse proxy with automatic HTTPS. It exposes Open WebUI and the MinIO console on domains with Let’s Encrypt certificates, without touching certbot.

The complete docker-compose.yml

services:
  ollama:
    image: ollama/ollama:latest
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    networks:
      - ai-stack

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - STORAGE_PROVIDER=s3
      - S3_ENDPOINT_URL=http://minio:9000
      - S3_ACCESS_KEY=admin
      - S3_SECRET_KEY=password123
      - S3_BUCKET_NAME=open-webui
    depends_on:
      - ollama
      - minio
    networks:
      - ai-stack

  minio:
    image: minio/minio:latest
    command: server /data --console-address ":9001"
    environment:
      - MINIO_ROOT_USER=admin
      - MINIO_ROOT_PASSWORD=password123
    volumes:
      - minio_data:/data
    ports:
      - "9000:9000"
      - "9001:9001"
    networks:
      - ai-stack

  caddy:
    image: caddy:2-alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - caddy_data:/data
    networks:
      - ai-stack

networks:
  ai-stack:

volumes:
  ollama_data:
  minio_data:
  caddy_data:

A minimal Caddyfile to expose the two public services:

chat.example.com {
    reverse_proxy open-webui:8080
}

storage.example.com {
    reverse_proxy minio:9001
}

Prerequisites and first run

Minimum server: Ubuntu 22.04 or 24.04, 16 GB RAM, 50 GB free SSD. If you have an Nvidia GPU, add deploy.resources.reservations.devices to the Ollama service for CUDA — but even CPU-only, Qwen2.5-7B responds in 3–5 seconds per message.

After docker compose up -d, pull the first model:

docker exec -it <ollama-container-name> ollama pull qwen2.5:7b

Or do it directly from the Open WebUI interface under Settings → Models. The model weighs about 4.7 GB, downloads once, and stays in the ollama_data volume.

RAG with company documents

Open WebUI connects to MinIO via the S3 environment variables already in your compose file. Create an open-webui bucket from the MinIO console at storage.example.com, then from Open WebUI go to Workspace → Knowledge, upload your PDFs, and enable the Knowledge Base in a new conversation.

The model answers by citing the documents you uploaded. Files stay on your server — no external API involved.

What to do

Start the stack with docker compose up -d and check the logs with docker compose logs -f to verify all services come up cleanly
Pull qwen2.5:7b as your first model — it balances quality and speed well on CPU — then consider llama3.3:70b if you have a GPU with 24 GB VRAM
Create a MinIO bucket, upload 5–10 company documents, and test RAG from Open WebUI to see if it answers questions about the content correctly