Complete self-hosted AI stack: Ollama + Open WebUI + MinIO + Caddy
How to build a complete private AI stack at home or at work: local model chat, file storage, HTTPS. One docker-compose.yml that runs on any server with 16 GB RAM.
Published: June 3, 2025
You can have a private ChatGPT alternative, with models that never leave your network, on a server you already own. It takes four open-source components and a single docker-compose.yml. Everything runs comfortably on a machine with 16 GB of RAM and a 50 GB SSD.
The four components
Ollama is the engine that downloads and runs LLM models locally. It exposes an OpenAI-compatible REST API on localhost:11434. It supports Qwen2.5, Llama 3.3, Mistral, Gemma 3, and dozens of others.
Open WebUI is the chat interface that connects to Ollama. It has the same UX as ChatGPT, handles multiple conversations, supports document uploads, and integrates with S3-compatible storage (so with MinIO).
MinIO is self-hosted S3-compatible object storage. You use it to upload company documents and run Q&A on them via RAG, without files ever leaving the server.
Caddy acts as a reverse proxy with automatic HTTPS. It exposes Open WebUI and the MinIO console on domains with Let’s Encrypt certificates, without touching certbot.
The complete docker-compose.yml
services:
ollama:
image: ollama/ollama:latest
volumes:
- ollama_data:/root/.ollama
ports:
- "11434:11434"
networks:
- ai-stack
open-webui:
image: ghcr.io/open-webui/open-webui:main
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- STORAGE_PROVIDER=s3
- S3_ENDPOINT_URL=http://minio:9000
- S3_ACCESS_KEY=admin
- S3_SECRET_KEY=password123
- S3_BUCKET_NAME=open-webui
depends_on:
- ollama
- minio
networks:
- ai-stack
minio:
image: minio/minio:latest
command: server /data --console-address ":9001"
environment:
- MINIO_ROOT_USER=admin
- MINIO_ROOT_PASSWORD=password123
volumes:
- minio_data:/data
ports:
- "9000:9000"
- "9001:9001"
networks:
- ai-stack
caddy:
image: caddy:2-alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile
- caddy_data:/data
networks:
- ai-stack
networks:
ai-stack:
volumes:
ollama_data:
minio_data:
caddy_data:
A minimal Caddyfile to expose the two public services:
chat.example.com {
reverse_proxy open-webui:8080
}
storage.example.com {
reverse_proxy minio:9001
}
Prerequisites and first run
Minimum server: Ubuntu 22.04 or 24.04, 16 GB RAM, 50 GB free SSD. If you have an Nvidia GPU, add deploy.resources.reservations.devices to the Ollama service for CUDA — but even CPU-only, Qwen2.5-7B responds in 3–5 seconds per message.
After docker compose up -d, pull the first model:
docker exec -it <ollama-container-name> ollama pull qwen2.5:7b
Or do it directly from the Open WebUI interface under Settings → Models. The model weighs about 4.7 GB, downloads once, and stays in the ollama_data volume.
RAG with company documents
Open WebUI connects to MinIO via the S3 environment variables already in your compose file. Create an open-webui bucket from the MinIO console at storage.example.com, then from Open WebUI go to Workspace → Knowledge, upload your PDFs, and enable the Knowledge Base in a new conversation.
The model answers by citing the documents you uploaded. Files stay on your server — no external API involved.
What to do
- Start the stack with
docker compose up -dand check the logs withdocker compose logs -fto verify all services come up cleanly - Pull
qwen2.5:7bas your first model — it balances quality and speed well on CPU — then considerllama3.3:70bif you have a GPU with 24 GB VRAM - Create a MinIO bucket, upload 5–10 company documents, and test RAG from Open WebUI to see if it answers questions about the content correctly