Sora finds a new rival in China as start-up Shengshu AI rolls out text-to-video tool

Chinese start-up Shengshu AI rolled out its text-to-video tool Vidu for global users on Tuesday, with support for both Chinese and English text prompts, in a fresh sign of efforts by China's tech firms to match OpenAI's Sora.

The video generation model is accessible through its official website, making it the latest Chinese start-up to offer text-to-video services to the public following players like Zhipu AI and Kuaishou Technology. Users registered with the service will be able to generate clips of four or eight seconds in length.

The Beijing-based company first unveiled Vidu in April, just two months after OpenAI announced its Sora video model, showing a few selected preview clips, making it the first firm in China to take on Sora.

Do you have questions about the biggest topics and trends from around the world? Get the answers with SCMP Knowledge, our new platform of curated content with explainers, FAQs, analyses and infographics brought to you by our award-winning team.

Shengshu said Vidu is able to generate a four-second clip in 30 seconds, according to a statement. That makes it one of the fastest on the market, as other similar tools usually take longer to generate a video of similar length.

In this photo illustration, a video created by Open AI's Sora tool plays on a monitor in Washington, DC, February 16, 2024. Photo: AFP alt=In this photo illustration, a video created by Open AI's Sora tool plays on a monitor in Washington, DC, February 16, 2024. Photo: AFP>

Shengshu exemplifies how China's prestigious Tsinghua University has emerged as a main force backing the country's AI ambitions. Behind Vidu is the firm's self-developed architecture called U-ViT, first detailed in a September 2022 research paper authored by a team led by Zhu Jun, Shengshu AI's chief scientist, who is also a computer science professor at Tsinghua University.

Another Tsinghua author of the paper, Bao Fan, currently serves as Shengshu's chief technology officer. Shengshu's chief executive Tang Jiayu was a graduate of Tsinghua's department of computer science and technology.

In an interview in April, Tang told local media that it would be easier for Chinese firms to catch up with Sora than with GPT-4, OpenAI's advanced large language model that is the technology behind ChatGPT. He did not elaborate.

In addition to text and image-to-video, Vidu has added a function that lays the foundation for commercialisation of the technology due to its potential use in the animation and content industries, Zhang Xudong, product director at Shengshu AI, said in an interview with the Post.

The new character-to-video function lets users upload an image of a real person or an animated character, and use simple text prompts to make it come alive.

"In the future we hope [users] could upload multiple characters and [describe] scenes, and have them act in those scenes, similar to how a film is being produced," Zhang said. "Our goal is to integrate AI tools with traditional sectors."

Shengshu, which has raised tens of millions of US dollars, counts Qiming Venture Partners, search giant Baidu, Alibaba Group Holding's fintech affiliate Ant Group, and the Beijing AI Industry Investment Fund as its backers. Alibaba owns the Post.

This article originally appeared in the South China Morning Post (SCMP), the most authoritative voice reporting on China and Asia for more than a century. For more SCMP stories, please explore the SCMP app or visit the SCMP's Facebook and Twitter pages. Copyright © 2024 South China Morning Post Publishers Ltd. All rights reserved.

Copyright (c) 2024. South China Morning Post Publishers Ltd. All rights reserved.