Skip to content
AImpact
IT EN
High Image & Video Gen · 1 min read

CogVideoX: the first open-source video model competitive with commercial ones

In one sentence Zhipu AI releases CogVideoX 5B and 10B: open-source text-to-video model with 3D full attention architecture, 720p, 10-second clips with high motion coherence. First Chinese open-source video model competitive with commercial offerings. Weights on HuggingFace.

Needs review Official source
ShareLinkedInX
Reading level

Until mid-2024, quality AI video was all closed: Sora (not public), Kling (Chinese, limited API), Runway (cloud, expensive). If you wanted to run a video model on your own server, open-source options were disappointing.

CogVideoX changes this. Zhipu AI releases full weights — 5 billion and 10 billion parameters — on HuggingFace, freely downloadable and usable. The model generates 720p 10-second videos with motion coherence never seen before in open source.

For developers, this means being able to build self-hosted video pipelines for the first time with presentable results. Not quite Sora, but for the first time open source enters territory where comparison with commercial products makes sense.

Companies

Zhipu AI

Tools

Tags

CogVideoXopen sourcetext-to-video720pHuggingFaceZhipu AI

Sources