ByteDance, the parent company of TikTok, has unveiled an AI solution called 'VideoWorld' that creates videos by understanding visuals, not language. This is a significant development in the field of AI video generation, as it marks the first time that an AI solution has been able to create videos from visual information alone.
According to Chinese media outlet China First Finance, the Doubao AI large model team under ByteDance, in collaboration with Beijing Jiaotong University and the University of Science and Technology of China, announced the development of VideoWorld.
OpenAI's Sora, an AI video generation model, creates related videos when text is input. In contrast, VideoWorld creates videos using only visual information, not text or voice. The media evaluated that VideoWorld is the first AI solution to create videos with visual information.
Complex or detailed actions such as origami or tying a tie are difficult to express clearly in language. VideoWorld is a program that creates videos by visually recognizing the actions of humans or objects by AI.
ByteDance explained, "VideoWorld is an academic research project and is currently in the process of exploring new technical methods, and it will take some time before it is commercialized." ByteDance introduced, "VideoWorld showed excellent performance in Go and robot control environment simulations, but still has shortcomings in real-world environments."
ByteDance explained that VideoWorld has achieved the level of a professional 5-dan in the game of Go and has performed robot tasks in various environments. ByteDance also explained, "We aim to solve numerous problems and develop VideoWorld into a general-purpose knowledge learner in the real world."
Doubao is an AI chatbot announced by ByteDance in August 2023. It is currently the second most used AI large model in China after DeepSeek.
The Doubao team was created within ByteDance in 2023. The Doubao team is dedicated to developing cutting-edge AI large model technology. Research areas include deep learning, reinforcement learning, large-scale language models (LLM), AI voice recognition, AI visual recognition, AI infrastructure, and AI security.
[Copyright (c) Global Economic Times. All Rights Reserved.]