Computer

Other than avatars, Google VLOGGER creates digital copies of any person

Other than avatars, Google VLOGGER creates digital copies of any person

And avatar, as we know, is a graphic representation, animated or static, of a user in a digital environment. It can be a stylized image, a drawing or photograph of a person, a character created by the user himself or a default image provided by the platform. Avatars can also represent aalternative identity or a character that distinguishes a real person in virtual space. Prehistory.

Google’s search team announced VLOGGERan artificial intelligence framework capable of generate realistic videos of people speaking and gesticulating, in perfect synchrony with their voices. What does the generative model Google to create all this? Simply a single photo of the person to be “animated” digitally within a video and a small piece of speech.

VLOGGER: revolution in generative artificial intelligence

Technologies based on the use of artificial intelligence continue to make great strides. The latest milestone is set by the Google Research team led by Enric Corona, an expert in AI and 3D human modeling.

Google VLOGGER avatar

VLOGGER is based on a type of machine learning model called “diffusion model“. This model, along with the rich dataset called MENTOR, forms the backbone of this revolutionary technology.

Il dataset MENTOR it is a vast archive containing over 800,000 different identities and more than 2,200 hours of video. This large and diverse dataset allowed VLOGGER to learn a wide range of human characteristics, including ethnicities, ages, clothing, poses, and surrounding environments. The developers have also paid maximum attention to avoiding so-called biasi.e. unwanted behaviors that can amplify prejudices or stereotypes present in training data.

How VLOGGER works

To generate a video, simply provide the system with abasic image and a short one voice recording. VLOGGER then uses a neural network to create body movement controls based on the audio data, including the direction the person is looking, facial expression and pose. Next, another neural network extends a large-scale image diffusion model to generate frames corresponding to various movements using the input data.

According to the Google research team, VLOGGER outperforms other cutting-edge methods in terms of image quality, identity preservation and temporal coherence. Another distinctive feature of VLOGGER is its ability to generate comprehensive images that include not only the face and lips, but also other body parts such as hands and facial expressions.

What are the possible application fields

A powerful tool like VLOGGER could be leveraged to create detailed 3D modelsphotorealistic avatars for virtual reality and gaming, virtual assistants and much more.

However, like any latest generation technology, VLOGGER can also present risks. The ease with which videos can be created increases the challenges related to fake news and the manipulation of digital content. It is important to take these issues into consideration and take steps to mitigate them.

On the other hand, if the Privacy Guarantor Europen had opened an investigation into OpenAI Sora, a mechanism that allows you to create professional videos from simple textual descriptions, imagine what the reaction could be towards VLOGGER which intervenes directly on images and voice recordings, potentially attributable to other subjects.

For these and other reasons, the technology behind VLOGGER is not yet publicly accessible: the project’s official website shows a whole series of practical demonstrations.

Leave a Reply

Your email address will not be published. Required fields are marked *