Hypervisors and AI: Proxmox, VMware, Nutanix explained simply

Let’s start from the beginning, because this is one of those topics where if you don’t understand the logical thread, everything else looks like black magic.

Imagine you want to set up an AI agent at your company — something that answers employee questions about internal documents, or automatically processes support tickets. To do that you need to run an AI model (like Ollama with Qwen or Mistral) on a server that’s always on, isolated from the rest of your systems, with dedicated resources, and that you can update or restart without touching the production ERP server.

This thing — “a computer running inside another computer, isolated, with its own resources” — is called a virtual machine, or VM.

What a hypervisor is, without the jargon

A hypervisor is the software that lets you create and manage virtual machines on a physical server. It sits between the hardware (CPU, RAM, disks, network cards) and the operating systems you want to run on top of it.

Think of an apartment building. The building is the physical server — you have some CPUs, some RAM, some disk. The apartments are the VMs — each one gets its own slice of resources, its own operating system, its own processes. The hypervisor is the building manager who assigns apartments, decides who gets how much space, and makes sure the tenants don’t interfere with each other.

The practical result: on a single physical server you can run simultaneously a Windows server with the ERP, an Ubuntu with the database, another Ubuntu with the AI service, and a CentOS with the software firewall. Each isolated from the others, each with its own CPU and RAM quota, each able to crash on its own without taking down the rest.

This is critical for AI for one specific reason: language models are RAM hungry. A decent model like Qwen2.5-7B wants at least 8-10 GB of RAM for itself. If you put it on a shared production server alongside the ERP, you risk saturating memory and bringing everything down. With a dedicated VM, you assign it exactly what it needs and the rest of the server doesn’t even know it’s there.

The four hypervisors you’ll find in enterprise

Proxmox VE is the choice of small-to-medium businesses and Linux-comfortable sysadmins. It’s completely free (open source, though a paid support plan exists), has a decent web interface, manages both classic VMs and LXC containers (even lighter than VMs), and over the years has matured enough for production use without worry. For deploying AI it’s the simplest option: create an LXC Ubuntu container, install Ollama, and you’re running in twenty minutes.

VMware vSphere/ESXi is the enterprise and public sector standard. It’s been around for twenty years, it’s rock-solid, and every enterprise sysadmin knows it. Broadcom acquired it in 2023 and drastically changed pricing, so many companies are now shopping around. For AI it works great: create an Ubuntu VM, install what you need, and behavior is identical to a physical server. Verify before buying GPU hardware that your vSphere license supports GPU passthrough — not all versions allow it, and finding out after spending thousands on a card is frustrating.

Nutanix is what you find in banks, insurance companies, and large enterprises with HCI infrastructure — storage, networking and compute in a single stack. It has an excellent management interface (Prism Central) and very granular permission governance. For AI deployment nothing changes: a Linux VM with Ollama is a Linux VM with Ollama, regardless of what’s underneath. One practical note: Nutanix distributed volumes have higher latency than local storage. For acceptable model loading times, put the AI VM’s storage on local NVMe SSDs, not a shared HDD volume.

Sangfor is present in some Italian public sector organizations and manufacturing companies, often as a cheaper alternative to VMware or Nutanix. It works, has an integrated Container Service, and Ollama runs on it without issues. One serious note: it’s a Chinese vendor, and before running an AI model with access to sensitive documents on Sangfor infrastructure, do a supply chain risk assessment with your CISO. Not prejudice — standard practice for any vendor with this geopolitical exposure.

How it works in practice

Once you’ve chosen your hypervisor and created an Ubuntu 22.04 VM with at least 16 GB RAM and 4 cores, the AI deployment is identical across all four. Install Docker, create this docker-compose.yml:

services:
  ollama:
    image: ollama/ollama
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    ports:
      - "3000:8080"
    depends_on:
      - ollama
volumes:
  ollama_data:

Run docker compose up -d, wait for Ollama to start, then docker exec -it ollama ollama pull qwen2.5:7b to download the model. Open a browser at http://VM-IP:3000 and you have your own private company AI running. Put Caddy or nginx in front with HTTPS and authentication, and colleagues can use it without data ever leaving your network.

GPU is optional. If you have one, passthrough works on all four hypervisors. If you don’t, a quantized 7B model on CPU responds in 3-5 seconds — slow for intensive use, but perfect for an internal tool that processes documents or answers questions about company procedures.

What to do

No hypervisor yet? Download Proxmox VE, install it on any server with 32 GB RAM and an SSD — you have a company AI lab ready to go.
Already on VMware or Nutanix? Create a dedicated Ubuntu 22.04 VM with 16 GB RAM, install Docker and use the docker-compose above — operational in under an hour.
Before connecting the AI service to sensitive documents: isolate the VM in a dedicated VLAN and require authentication — an LLM exposed without auth on the internal network is a risk not worth taking.