Kimi VL Thinking (Moonshot AI): first open visual model with RL-trained chain-of-thought reasoning

In one sentence Moonshot AI releases Kimi VL Thinking: a visual model combining vision encoding with long chain-of-thought reasoning via reinforcement learning. Solves multi-step geometry, scientific chart analysis, and figure interpretation. The first open visual reasoning model matching GPT-4o on multi-step visual tasks.

Needs review Reputable source

ShareLinkedIn X

Most image-understanding models work like this: look at the image, give an answer. Fast, but often wrong on complex problems.

Kimi VL Thinking from Moonshot AI (a Chinese AI company) works differently: when given a difficult visual problem — a geometry diagram, a scientific chart, a sequence of images requiring reasoning — it stops to "think." It generates a long chain of intermediate reflections (in the style of "first I observe X, then I deduce Y, therefore I can conclude Z") before giving the final answer.

This approach, called chain-of-thought reasoning, was already known in text (ChatGPT does it when it says "let's reason step by step"). Kimi VL is the first open-source model to bring it systematically to images, through reinforcement learning training that rewards correct reasoning chains.

The practical result: on geometry problems, physics with diagrams, or scientific chart analysis, Kimi VL reaches the same level as GPT-4o — one of the best models in the world and significantly more expensive to use. And it is freely available for anyone who wants to use or study it.