CogVideoX: the first open-source video model competitive with commercial ones
In one sentence Zhipu AI releases CogVideoX 5B and 10B: open-source text-to-video model with 3D full attention architecture, 720p, 10-second clips with high motion coherence. First Chinese open-source video model competitive with commercial offerings. Weights on HuggingFace.
Until mid-2024, quality AI video was all closed: Sora (not public), Kling (Chinese, limited API), Runway (cloud, expensive). If you wanted to run a video model on your own server, open-source options were disappointing.
CogVideoX changes this. Zhipu AI releases full weights — 5 billion and 10 billion parameters — on HuggingFace, freely downloadable and usable. The model generates 720p 10-second videos with motion coherence never seen before in open source.
For developers, this means being able to build self-hosted video pipelines for the first time with presentable results. Not quite Sora, but for the first time open source enters territory where comparison with commercial products makes sense.
Companies
Zhipu AI
Tools
—
Tags
Sources