YOLOv8: AI object detection in Python for sysadmins and developers

YOLOv8 is the most widely used object detection model in the world. It runs on a normal CPU — on a modern i7 it does 25-30 FPS on 720p video with no GPU. Install one package, write three lines of Python, and you can detect people, vehicles, and objects in a video in real time. No cloud, no variable costs, no data leaving your network.

Install and first test in 3 commands

pip install ultralytics
yolo predict model=yolov8n.pt source='https://ultralytics.com/images/bus.jpg'
yolo predict model=yolov8n.pt source=0  # webcam

The first command installs everything. The second downloads the yolov8n model (nano, 6MB) and runs it on a test image — you see bounding boxes around people and the bus. The third runs it on your webcam in real time.

Models range from yolov8n (nano, fast, less accurate) to yolov8x (extra-large, slower but ~55% mAP on COCO). For real CPU use, yolov8n or yolov8s are the right starting point.

Connecting to an ONVIF IP camera via RTSP

If you have IP cameras on your network (Hikvision, Dahua, any ONVIF camera), they connect via RTSP stream:

from ultralytics import YOLO

model = YOLO("yolov8n.pt")

for result in model.track(source="rtsp://admin:password@192.168.0.92/stream1",
                           stream=True, persist=True):
    # result.boxes contains bounding boxes, confidence, class
    boxes = result.boxes
    for box in boxes:
        cls = result.names[int(box.cls)]
        conf = float(box.conf)
        print(f"{cls}: {conf:.2f}")

stream=True processes frame by frame without loading everything into memory. persist=True enables cross-frame tracking — required for ByteTrack. The RTSP URL varies by brand: for Hikvision it’s usually rtsp://user:pass@IP/Streaming/Channels/101.

Tracking with ByteTrack: counting unique people

YOLOv8 alone tells you “there are 3 people in this frame”. ByteTrack tells you “that person is the same one from the previous frame, ID is 42”. With tracking you can count unique visits, calculate dwell time, detect intrusions.

from ultralytics import YOLO
from collections import defaultdict

model = YOLO("yolov8n.pt")
track_history = defaultdict(list)
unique_ids = set()

for result in model.track(source="rtsp://192.168.0.92/stream1",
                           stream=True, persist=True, tracker="bytetrack.yaml"):
    if result.boxes.id is not None:
        for track_id, cls_idx in zip(result.boxes.id.int(), result.boxes.cls.int()):
            if result.names[int(cls_idx)] == "person":
                unique_ids.add(int(track_id))

print(f"Unique people detected: {len(unique_ids)}")

ByteTrack is included in ultralytics — nothing extra to install. Practical use cases: counting customers entering a store, monitoring access to a restricted area, detecting vehicles in a parking lot.

Deploying on a server with FastAPI in front

To expose detections as a REST API:

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from ultralytics import YOLO
import cv2, base64

app = FastAPI()
model = YOLO("yolov8n.pt")

@app.get("/detect")
async def detect_stream():
    def generate():
        cap = cv2.VideoCapture("rtsp://192.168.0.92/stream1")
        while True:
            ret, frame = cap.read()
            if not ret:
                break
            results = model(frame)
            annotated = results[0].plot()
            _, buffer = cv2.imencode('.jpg', annotated)
            yield (b'--frame\r\nContent-Type: image/jpeg\r\n\r\n'
                   + buffer.tobytes() + b'\r\n')
    return StreamingResponse(generate(), media_type="multipart/x-mixed-replace; boundary=frame")

Minimal docker-compose.yml:

services:
  detector:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - ./models:/app/models
    devices:
      - /dev/video0:/dev/video0  # only if using USB webcam

What to do

Install ultralytics and test yolov8n on your webcam or a local video before touching IP cameras — you need to see the boxes working locally to calibrate expectations.
For ONVIF cameras, verify the RTSP URL with VLC before using it in code: VLC → Open Network → rtsp://user:pass@IP/stream1.
Add ByteTrack only if you need persistent identity across frames — if you just need “how many objects are there right now”, plain detection is enough and faster.