UFO: the first robust agent for automating Windows desktop applications
In one sentence Microsoft Research publishes UFO (UI-Focused Agent), an agent that observes the Windows screen (active app + screenshot + control tree), plans actions and executes them via Windows UI Automation and Win32 API. First Windows-native system with reliable multi-application workflow support.
Automating operations on Windows has always been work for specialized tools like AutoIt, Power Automate, or expensive RPA solutions. UFO proposes something different: an AI agent that "sees" the screen as a human would, understands what is happening in applications, and knows how to interact with them.
The system observes three things together: the desktop screenshot, the list of open applications, and the internal structure of the controls of the active application (those buttons, text fields, menus). With this information it plans actions: clicks, typing, scrolling, opening menus.
The important part is that it works with any Windows application, not just ones with a special API: Word, Excel, the control panel, legacy software from the 1990s. If a human can use it by clicking, UFO can use it programmatically.
It is designed for multi-application automations: "take data from this Excel file, enter it into the SAP system, then send a summary email in Outlook".
Companies
Microsoft
Tools
—
Tags
Sources