Skip to content
AImpact
IT EN
High Multimodal AI · 1 min read

Kimi VL Thinking (Moonshot AI): first open visual model with RL-trained chain-of-thought reasoning

In one sentence Moonshot AI releases Kimi VL Thinking: a visual model combining vision encoding with long chain-of-thought reasoning via reinforcement learning. Solves multi-step geometry, scientific chart analysis, and figure interpretation. The first open visual reasoning model matching GPT-4o on multi-step visual tasks.

Needs review Reputable source
ShareLinkedInX
Reading level

Most image-understanding models work like this: look at the image, give an answer. Fast, but often wrong on complex problems.

Kimi VL Thinking from Moonshot AI (a Chinese AI company) works differently: when given a difficult visual problem — a geometry diagram, a scientific chart, a sequence of images requiring reasoning — it stops to "think." It generates a long chain of intermediate reflections (in the style of "first I observe X, then I deduce Y, therefore I can conclude Z") before giving the final answer.

This approach, called chain-of-thought reasoning, was already known in text (ChatGPT does it when it says "let's reason step by step"). Kimi VL is the first open-source model to bring it systematically to images, through reinforcement learning training that rewards correct reasoning chains.

The practical result: on geometry problems, physics with diagrams, or scientific chart analysis, Kimi VL reaches the same level as GPT-4o — one of the best models in the world and significantly more expensive to use. And it is freely available for anyone who wants to use or study it.

Companies

Moonshot AI

Tools

Tags

Kimi VLvisual reasoningchain-of-thoughtRLmultimodalMoonshot AIgeometryscientific reasoning

Sources