Competition in China's text-to-video artificial intelligence-powered models is heating up as domestic tech companies are scrambling to roll out their self-developed video generation models after US-based AI research company OpenAI's Sora has taken the world by storm.
Chinese video-sharing platform Kuaishou Technology recently updated its Kling AI model, which comes with new features including image-to-video and video extension capabilities, enabling the creation of videos up to approximately three minutes in length.
The previous version can process text into video clips up to two minutes long with 1080p resolution, while supporting a variety of aspect ratios. The model can interpret prompts to generate high-quality videos that mimic the physical world and create imaginative scenes from text instructions, Kuaishou said.
Zhang Di, vice-president of Kuaishou and head of the company's AI model team, said they are dedicated to investing in and innovating large language model technology, and aim to develop AI-generated content (AIGC) creation tools. LLMs are AI models fed with huge amounts of text data for use in a variety of tasks, ranging from natural language processing to machine translation.
To date, over 500,000 users have applied for access to Kling's beta testing, with the number of generated videos reaching 7 million. Kuaishou said it will continue to focus on enhancing video clarity and introducing more innovative features to meet diverse user needs.
Kuaishou is one of a string of Chinese tech companies racing to launch challengers to Sora through enhancing AI models to create images and videos based on text prompts.
In April, Chinese AI firm Shengshu Technology and Tsinghua University launched what they called the first Sora-level text-to-video large model Vidu, which can create a 16-second, high-definition video at 1080p resolution with a single click. The model is able to understand and generate Chinese content such as pandas and dragons.
Moreover, ByteDance, the parent company of Chinese short video platform Douyin, has unveiled MagicVideo-V2, its AI model for text-to-video generation. It can produce an aesthetically pleasing and high-resolution video with remarkable fidelity and smoothness.
Ma Shicong, an analyst with Beijing-based internet consultancy Analysys, said Kuaishou has accumulated ample experience and technical strengths in AI, video, livestreaming and algorithms over the past few years, adding that the company hopes to seek new sources of revenue and speed up its monetization efforts by expanding its footprint in the fast-developing AIGC segment.
"Talent, data and computing power are key to text-to-video generation models," said Pan Helin, a member of the Expert Committee for Information and Communication Economy, which is under the Ministry of Industry and Information Technology.
Pan said the process of developing such models necessitates higher requirements for computing capacity, algorithms and high-quality data, and more efforts are required to bolster the efficient circulation of data elements, and expand application scenarios of video generation models in a wider range of segments.
Chinese tech companies should beef up self-developed and proprietary abilities in underlying computing power chips and programming software, as well as increase investments in basic scientific research, to catch up with foreign counterparts in the AI chatbot race, he added.
Experts also said multimodal LLMs that integrate different types of content like text, images, audio and video into AI models are key to the development of AI technology overall.
Chen Duan, director of the Digital Economy Integration Innovation Development Center at the Central University of Finance and Economics, said AIGC technology will lead to a new revolution in the field of digital content production, and bolster innovation in the digital culture industry.
Chinese tech enterprises have unique advantages in expanding AI application scenarios compared with their foreign peers, based on China's enormous domestic social networks and the world's largest number of active internet users, she said, adding text-to-video generators have the potential to revolutionize short video industries, advertising and movie trailers.