Kakao Unveils ‘Clicking AI’: Integrating CUA into ‘Kanana-v’ to Redefine AI Agents
KO YONG-CHUL Reporter
korocamia@naver.com | 2026-03-30 08:04:26
SEOUL — Kakao is officially entering the race to develop "Computer Use Agent" (CUA) technology—an advanced AI capability that allows systems to perceive computer screens and manipulate mice and keyboards just like a human. This move signals the South Korean tech giant's ambition to secure a foothold in the rapidly evolving autonomous AI agent market, currently led by global titans like OpenAI and Google.
According to industry sources on March 29, Kakao has launched the development of CUA by expanding the functionality of its proprietary visual language model, Kanana-v.
The Engine of Autonomy: What is CUA?
CUA is considered the "last mile" of AI agency. While traditional AI can process text or images, a CUA-equipped AI can navigate web browsers, click buttons, and type information to complete complex tasks autonomously. This technology powers services like OpenAI’s "Operator" and Google’s "Project Jarvis" (formerly known as Project Marina), which automate activities such as travel bookings and online shopping.
Kakao’s technical roadmap for CUA focuses on two critical pillars:
GUI Grounding: This technology enables the AI to identify the exact pixel coordinates of targets on a screen to execute clicks or movements accurately.
Planning: Kakao is currently developing the ability for AI to break down high-level user goals (e.g., "Plan a weekend trip to Jeju") into granular, executable steps within a computer interface.
Bridging the Gap in the Kakao Ecosystem
The strategic intent behind this development is to unify Kakao’s vast array of services—including KakaoTalk, Search, Commerce, Local Services, and Content—under a seamless AI agent.
By utilizing CUA, Kakao aims to overcome the technical hurdles of operating across fragmented environments, such as mobile apps, web interfaces, and legacy internal tools. Instead of requiring a separate API for every service, the AI can "see" and "use" the existing interfaces to fulfill user requests.
"We intend to evolve CUA beyond simple UI manipulation into a service-oriented agent technology," a Kakao representative stated. "Beyond just understanding the screen, the 'end-to-end' capability—composing task procedures based on user intent and completing the final objective—is our priority."
On-Device AI and the Future of ‘Kanana’
Kakao is also exploring the lightweighting of its multimodal models to enable On-Device AI. This would allow AI features to run locally on smartphones, enhancing privacy and response speeds.
Following the official launch of 'Kanana in KakaoTalk' earlier this month, the company plans to integrate multimodal capabilities into its mobile ecosystem. The long-term vision involves the 'Kanana-o' model, an omni-modal AI designed to integrate voice, vision, and text into a single, cohesive user experience.
WEEKLY HOT
- 1The Spiritual Hygiene of Anger: Why ‘Holding It In’ is a Path to Sickness
- 2'Epic Fury' Without an Exit: The Aftermath of Trump’s "Hit-and-Run" Politics
- 3Gmarket to Launch New ‘KKOK’ Membership on April 23: Joining the E-Commerce ‘Rewards War’
- 4LG Electronics Leverages 'Physical AI' to Transform Logistics into a Smart Factory Frontier
- 5Localization of Specialized Semiconductors Complete: 4-Inch Wafer Yield Hits 95%
- 6Samsung Completes Transition to 236-Layer 8th Gen V-NAND in Xi’an; 9th Gen Mass Production Slated for Late 2026