Generative AI: Vlogger creates video based on an image

Researchers at Google have unveiled a framework that can create a video from a single image and an audio recording. In doing so, Vlogger builds on the success of the last generative diffusion models. For example, OpenAI recently introduced the impressive AI Sora, which generates an almost photorealistic video from a voice instruction. In autumn 2023, “Hey Gen” was introduced, an AI that can be used to translate video recordings into different languages - suddenly everyone is multilingual if he or she wants to. Vlogger should combine all of this.

Recommended editorial content

With your consent, an external video (Kaltura Inc.) will be loaded here.

Luis Enrique: “Nobody considered us favorites and we are in the semifinals”

Oversight Board examines Meta's approach to AI-generated images of naked women

Roasted, breaded and fried eggplant

Always load videos

The research team led by doctoral student Enric Corona from the Universitat Politècnica de Catalunya has developed a method that is said to be able to do more than previous work. Realistic speaking videos should be created using a two-stage pipeline. In the first stage, according to the researchers, body movements are generated using audio input and a still image depicting a human in a pose. In stage two, the result is translated into frames using an image-to-image model.

Vlogger framework

(Bild: Corona et al.)

This approach is intended to create videos of variable length whose content can also be controlled. For example, it is possible to use one image to create different videos in which the person moves differently. In comparison to some previous works, Vlogger should, among other things, work without training data from individuals. In addition, the images should be photorealistic and audio recordings as well as the control of the body should be controllable.

In addition, Vlogger allows you to customize details such as facial expressions in already created videos. In one example, you can see, among other things, how a person closes their eyes or, alternatively, their mouth in the same sequence.

As with Hey Gen, it is possible for videos to be translated into other languages. However, in an example video it is noticeable that the lip movements do not quite match the sound. They appear partially synchronized. In general, the videos created with Vlogger still seem a bit artificial in some places.

(mack)

Generative AI: Vlogger creates video based on an image

You might also like

Luis Enrique: “Nobody considered us favorites and we are in the semifinals”

Oversight Board examines Meta's approach to AI-generated images of naked women

Roasted, breaded and fried eggplant

Sylvia Diaz

And who defends the self-employed?

Leave a Reply Cancel reply

Browse by Category

CATEGORIES

Generative AI: Vlogger creates video based on an image

Recommended editorial content

You might also like

Luis Enrique: “Nobody considered us favorites and we are in the semifinals”

Oversight Board examines Meta's approach to AI-generated images of naked women

Roasted, breaded and fried eggplant

Sylvia Diaz

And who defends the self-employed?

Leave a Reply Cancel reply

Browse by Category

CATEGORIES

BROWSE BY TAG