Top Stories

Google releases Veo, a text-to-video AI model to compete with Sora from OpenAI

Google releases Veo, a text-to-video AI model to compete with Sora from OpenAI


Veo, which debuted at Google I/O 2024, can produce longer than one minute videos in 1080p HD.


A screen grab from a Google Veo video


At its annual developer conference Google I/O 2024 on May 14, Google unveiled Veo, their most sophisticated video creation model capable of producing high-definition video in a variety of visual and cinematic genres.


The artificial intelligence (AI) video generating models market is becoming more and more competitive, with competitors like OpenAI's Sora, Facebook parent Meta's Emu Video, and startups like Runway and Stability AI making similar announcements.


In particular, Sora has stunned viewers with its lifelike graphics since its release in February.


Videos with a resolution of 1080p and a duration of more than a minute may be produced with Veo. The model, according to the business, can produce video that precisely renders details in lengthy prompts and captures tone. It also has an excellent comprehension of natural language and visual semantics.


Additionally, it can produce consistent and cohesive film with realistic movement of people, animals, and objects throughout scenes. It is capable of understanding cinematic phrases such as "timelapse" and "aerial shots of a landscape". These produced films may be edited further with the help of extra prompts.


Google DeepMind CEO Demis Hassabis said, "We are also exploring features like storyboarding and generating longer scenes."


Expanding upon Google's previous AI video creation initiatives


With the use of scaling laws, architecture, and other methods, Veo expands on Google's years-long work in generative video models, which include Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet, and Lumiere.


According to the firm, a variety of filmmakers and producers are presently being invited to test out the idea. Assuring that creators "have a voice in how they are developed" is the company's objective, and these partnerships will help it better how it develops, produces, and deploys these technologies.


At the conference, Google also gave a sneak peek at their partnership with director Donald Glover and his creative firm, Gilga, who used Veo in a test project.


"With Veo, we’ve upgraded methods for how the model learns to understand what's in a video, generates high-definition images, simulates the physics of the actual world and more." said in a blog post by Doug Eck, senior research director at Google, and Eli Collins, vice president of product management at Google.


"These learnings will fuel advances across our AI investigation as well as enable us to build even more useful products that help people interact with one another in new ways" they said.


In the next weeks, a private preview of Veo will be made accessible to a limited group of producers inside Google's AI video generator, VideoFX, as part of the company's Labs program. To get on the queue, one must register. In the future, Google claims to be integrating some of Veo's features into YouTube Shorts and other products.


The latest model of Imagen 3.


Additionally, Google unveiled a new iteration of Imagen, its highest-quality text-to-image model to date, according to the firm.


The firm leaders said in the blog post that the model, named Imagen 3, can create photorealistic, lifelike photos with an amazing degree of detail and considerably fewer annoying visual defects than previous models.


According to them, Imagen 3 comprehends natural language, the purpose of the user's request, and also combines little data from lengthier prompts.


It's also the greatest model we've had so far for text rendering, which has proven difficult for models that generate images. Google executives Collins and Eck said, "This capability opens up possibilities for producing personalized birthday messages, title slides in speaking engagements, and more."


Through Google's Labs program, a private preview of Imagen 3 will be made accessible to a small group of producers inside ImageFX, the AI picture generator. Individuals may join the queue in order to register to test the model. Developers and businesses will soon have access to Imagen 3 via Vertex AI, Google's managed AI app development platform.


In addition to these updates, Google said that Grammy Award-winning artist Wyclef Jean, Grammy nominee composer Justin Tranter, and electronic musician Marc Rebillet had uploaded demo song recordings made using the company's music AI capabilities to their YouTube accounts.


The tech giant is also adding text and video to the list of media supported by SynthID, a watermarking tool that embeds digital watermarks created by AI in photos and audio. The business said that SynthID would watermark every video produced by Veo on VideoFX.



No comments: