Kimi VL Thinking (Moonshot AI): first open visual model with RL-trained chain-of-thought reasoning
In one sentence Moonshot AI releases Kimi VL Thinking: a visual model combining vision encoding with long chain-of-thought reasoning via reinforcement learning. Solves multi-step geometry, scientific chart analysis, and figure interpretation. The first open visual reasoning model matching GPT-4o on multi-step visual tasks.
Most image-understanding models work like this: look at the image, give an answer. Fast, but often wrong on complex problems.
Kimi VL Thinking from Moonshot AI (a Chinese AI company) works differently: when given a difficult visual problem — a geometry diagram, a scientific chart, a sequence of images requiring reasoning — it stops to "think." It generates a long chain of intermediate reflections (in the style of "first I observe X, then I deduce Y, therefore I can conclude Z") before giving the final answer.
This approach, called chain-of-thought reasoning, was already known in text (ChatGPT does it when it says "let's reason step by step"). Kimi VL is the first open-source model to bring it systematically to images, through reinforcement learning training that rewards correct reasoning chains.
The practical result: on geometry problems, physics with diagrams, or scientific chart analysis, Kimi VL reaches the same level as GPT-4o — one of the best models in the world and significantly more expensive to use. And it is freely available for anyone who wants to use or study it.
Companies
Moonshot AI
Tools
—
Tags
Sources