Skip to content
AImpact
IT EN
Medium Agents · 1 min read

UFO: the first robust agent for automating Windows desktop applications

In one sentence Microsoft Research publishes UFO (UI-Focused Agent), an agent that observes the Windows screen (active app + screenshot + control tree), plans actions and executes them via Windows UI Automation and Win32 API. First Windows-native system with reliable multi-application workflow support.

Needs review Official source
ShareLinkedInX
Reading level

Automating operations on Windows has always been work for specialized tools like AutoIt, Power Automate, or expensive RPA solutions. UFO proposes something different: an AI agent that "sees" the screen as a human would, understands what is happening in applications, and knows how to interact with them.

The system observes three things together: the desktop screenshot, the list of open applications, and the internal structure of the controls of the active application (those buttons, text fields, menus). With this information it plans actions: clicks, typing, scrolling, opening menus.

The important part is that it works with any Windows application, not just ones with a special API: Word, Excel, the control panel, legacy software from the 1990s. If a human can use it by clicking, UFO can use it programmatically.

It is designed for multi-application automations: "take data from this Excel file, enter it into the SAP system, then send a summary email in Outlook".

Companies

Microsoft

Tools

Tags

UFOWindows agentUI Automationdesktop agentMicrosoft ResearchRPA

Sources