Kakao Unveils ‘Clicking AI’: Integrating CUA into ‘Kanana-v’ to Redefine AI Agents
KO YONG-CHUL Reporter
korocamia@naver.com | 2026-03-30 08:04:26
SEOUL — Kakao is officially entering the race to develop "Computer Use Agent" (CUA) technology—an advanced AI capability that allows systems to perceive computer screens and manipulate mice and keyboards just like a human. This move signals the South Korean tech giant's ambition to secure a foothold in the rapidly evolving autonomous AI agent market, currently led by global titans like OpenAI and Google.
According to industry sources on March 29, Kakao has launched the development of CUA by expanding the functionality of its proprietary visual language model, Kanana-v.
The Engine of Autonomy: What is CUA?
CUA is considered the "last mile" of AI agency. While traditional AI can process text or images, a CUA-equipped AI can navigate web browsers, click buttons, and type information to complete complex tasks autonomously. This technology powers services like OpenAI’s "Operator" and Google’s "Project Jarvis" (formerly known as Project Marina), which automate activities such as travel bookings and online shopping.
Kakao’s technical roadmap for CUA focuses on two critical pillars:
GUI Grounding: This technology enables the AI to identify the exact pixel coordinates of targets on a screen to execute clicks or movements accurately.
Planning: Kakao is currently developing the ability for AI to break down high-level user goals (e.g., "Plan a weekend trip to Jeju") into granular, executable steps within a computer interface.
Bridging the Gap in the Kakao Ecosystem
The strategic intent behind this development is to unify Kakao’s vast array of services—including KakaoTalk, Search, Commerce, Local Services, and Content—under a seamless AI agent.
By utilizing CUA, Kakao aims to overcome the technical hurdles of operating across fragmented environments, such as mobile apps, web interfaces, and legacy internal tools. Instead of requiring a separate API for every service, the AI can "see" and "use" the existing interfaces to fulfill user requests.
"We intend to evolve CUA beyond simple UI manipulation into a service-oriented agent technology," a Kakao representative stated. "Beyond just understanding the screen, the 'end-to-end' capability—composing task procedures based on user intent and completing the final objective—is our priority."
On-Device AI and the Future of ‘Kanana’
Kakao is also exploring the lightweighting of its multimodal models to enable On-Device AI. This would allow AI features to run locally on smartphones, enhancing privacy and response speeds.
Following the official launch of 'Kanana in KakaoTalk' earlier this month, the company plans to integrate multimodal capabilities into its mobile ecosystem. The long-term vision involves the 'Kanana-o' model, an omni-modal AI designed to integrate voice, vision, and text into a single, cohesive user experience.
WEEKLY HOT
- 1Zeekr Targets 2,000 Sales for '7X' EV in South Korea This Year
- 2Tesla and BYD Penetrate South Korea’s Stronghold as Domestic Auto Sales Stumble
- 3Incheon Semiconductor High School Partners with Chungnam National University to Foster Next-Gen Tech Talent
- 4Murata Unveils Next-Gen Resin Electrode MLCC for Automotive Applications
- 5L&F Plus Secures KRW 220 Billion from National Growth Fund to Anchor South Korea’s First Mass LFP Cathode Production
- 6Nvidia CEO Jensen Huang to Arrive in South Korea for "Sam-So" Meeting with Tech Tycoons