Google DeepMind Unveils Gemini 2.5: AI Model for Web & Mobile Interaction
Google DeepMind has revealed a new AI model, Gemini 2.5 Computer Use. It's designed to interact with web and mobile interfaces, currently used in Google's own projects. The model is available for preview via the Gemini API.
Gemini 2.5 Computer Use operates in a loop, receiving screenshots, user requests, and previous actions to generate UI actions. It's already employed in Google's UI testing, like Project Mariner and the Firebase Testing Agent. It's also featured in AI Mode in Search, accessible through Google AI Studio and Vertex AI.
The model outperforms alternatives in benchmarks like Online-Mind2Web and WebVoyager, achieving over 70% accuracy with around 225 seconds of latency. It's primarily optimized for web browsers but also works for mobile UI control, not yet suitable for desktop OS-level control.
Safety features are included to mitigate risks such as misuse and unexpected behavior. These include a Per-Step Safety Service and developer-set system instructions.
The Gemini 2.5 Computer Use model is now available in public preview through the Gemini API in Google AI Studio and Vertex AI. Developers can experience its browser-optimized UI operation capabilities immediately. While primarily optimized for web browsers, it also works for mobile UI control, with desktop OS-level control not yet supported.