Zhipu AI launches video model in a sign more Chinese tech firms are taking on OpenAI's Sora

Chinese artificial intelligence (AI) start-up Zhipu on Friday debuted its video generation model, in the latest sign that local tech firms are gaining ground in the AI video arena.

The Ying text-to-video model accepts both text and image prompts to generate six second video clips in around 30 seconds. Users can fine-tune the results with style options that include 3D animation, cinematic or oil painting look, as well as emotional themes such as tense, lively and lonely.

The service, accessible through the official website and mobile apps of Zhipu AI's ChatGLM chatbot, was made immediately available to all users for unlimited use, the company said at a launch event in Beijing on Friday, although the free version will mean longer wait times during peak use time.

Do you have questions about the biggest topics and trends from around the world? Get the answers with SCMP Knowledge, our new platform of curated content with explainers, FAQs, analyses and infographics brought to you by our award-winning team.

The launch of Ying comes two days after a similar move by Kuaishou, the short video rival of ByteDance's Douyin, the Chinese sibling of TikTok, and signals that Chinese tech firms are taking on industry leader OpenAI in the field of video generation.

Screenshot of a short video generated by Ying, the video model launched by Zhipu AI. Photo: Handout alt=Screenshot of a short video generated by Ying, the video model launched by Zhipu AI. Photo: Handout>

On Wednesday, Kuaishou made its highly sought-after Kling video model available for wider test use, with each customer able to generate six videos per day.

Kling offers annual paid plans that allow for up to 60 and 800 monthly video generations at a cost of 396 yuan (US$54.63)) and 3996 yuan, respectively.

Meanwhile, San Francisco-based OpenAI, which pioneered AI video generation with the announcement of Sora in February, has yet to make the model available for public use.

When asked about the launch date of Sora, Aditya Ramesh, one of the key members on the development team, said OpenAI wants to make sure the model cannot be used to generate and spread false information.

The technology behind Ying is a self-developed text-to-video model called CogVideoX, similar to the diffusion transformer (DiT) architecture used by OpenAI's Sora, with improved inferencing speed that leads to faster video generation, Zhipu chief executive Zhang Peng said on Friday, adding that the firm gained some inspiration from Sora's algorithm design.

While OpenAI has yet to make Sora available for wider public use, the company has published technical details on how it works.

Zhang also said Zhipu is working to launch a new iteration of the video model that is able to generate longer videos with higher definition.

This article originally appeared in the South China Morning Post (SCMP), the most authoritative voice reporting on China and Asia for more than a century. For more SCMP stories, please explore the SCMP app or visit the SCMP's Facebook and Twitter pages. Copyright © 2024 South China Morning Post Publishers Ltd. All rights reserved.

Copyright (c) 2024. South China Morning Post Publishers Ltd. All rights reserved.