
SEOUL — Kakao is officially entering the race to develop "Computer Use Agent" (CUA) technology—an advanced AI capability that allows systems to perceive computer screens and manipulate mice and keyboards just like a human. This move signals the South Korean tech giant's ambition to secure a foothold in the rapidly evolving autonomous AI agent market, currently led by global titans like OpenAI and Google.
According to industry sources on March 29, Kakao has launched the development of CUA by expanding the functionality of its proprietary visual language model, Kanana-v.
The Engine of Autonomy: What is CUA?
CUA is considered the "last mile" of AI agency. While traditional AI can process text or images, a CUA-equipped AI can navigate web browsers, click buttons, and type information to complete complex tasks autonomously. This technology powers services like OpenAI’s "Operator" and Google’s "Project Jarvis" (formerly known as Project Marina), which automate activities such as travel bookings and online shopping.
Kakao’s technical roadmap for CUA focuses on two critical pillars:
GUI Grounding: This technology enables the AI to identify the exact pixel coordinates of targets on a screen to execute clicks or movements accurately.
Planning: Kakao is currently developing the ability for AI to break down high-level user goals (e.g., "Plan a weekend trip to Jeju") into granular, executable steps within a computer interface.
Bridging the Gap in the Kakao Ecosystem
The strategic intent behind this development is to unify Kakao’s vast array of services—including KakaoTalk, Search, Commerce, Local Services, and Content—under a seamless AI agent.
By utilizing CUA, Kakao aims to overcome the technical hurdles of operating across fragmented environments, such as mobile apps, web interfaces, and legacy internal tools. Instead of requiring a separate API for every service, the AI can "see" and "use" the existing interfaces to fulfill user requests.
"We intend to evolve CUA beyond simple UI manipulation into a service-oriented agent technology," a Kakao representative stated. "Beyond just understanding the screen, the 'end-to-end' capability—composing task procedures based on user intent and completing the final objective—is our priority."
On-Device AI and the Future of ‘Kanana’
Kakao is also exploring the lightweighting of its multimodal models to enable On-Device AI. This would allow AI features to run locally on smartphones, enhancing privacy and response speeds.
Following the official launch of 'Kanana in KakaoTalk' earlier this month, the company plans to integrate multimodal capabilities into its mobile ecosystem. The long-term vision involves the 'Kanana-o' model, an omni-modal AI designed to integrate voice, vision, and text into a single, cohesive user experience.
[Copyright (c) Global Economic Times. All Rights Reserved.]





























