CogVideoX: the first open-source video model competitive with commercial ones

In one sentence Zhipu AI releases CogVideoX 5B and 10B: open-source text-to-video model with 3D full attention architecture, 720p, 10-second clips with high motion coherence. First Chinese open-source video model competitive with commercial offerings. Weights on HuggingFace.

Needs review Official source

ShareLinkedIn X

Until mid-2024, quality AI video was all closed: Sora (not public), Kling (Chinese, limited API), Runway (cloud, expensive). If you wanted to run a video model on your own server, open-source options were disappointing.

CogVideoX changes this. Zhipu AI releases full weights — 5 billion and 10 billion parameters — on HuggingFace, freely downloadable and usable. The model generates 720p 10-second videos with motion coherence never seen before in open source.

For developers, this means being able to build self-hosted video pipelines for the first time with presentable results. Not quite Sora, but for the first time open source enters territory where comparison with commercial products makes sense.