It is once again OpenAI to surprise the tech community with a novelty which, according to the first reactions on the web, it left almost everyone speechless. The company announced Soraa new AI model that can generate realistic videos or imaginative starting from a text prompt. For the first time, videos can have a maximum duration of 60 seconds.
The AI model can create complex scenes, with multiple people, different types of movement, and fine detail of both the subject and what you see in the background. But that’s not all, because Sora – according to what OpenAI declared – can also generate video on a fixed image and enrich and/or extend an already existing video by “filling” the missing frames.
Some examples are available on X and in the company blog post, e they are truly stunning. Of course, perfection has not been achieved because in some cases artefacts and anomalies are visible (such as the floor moving in a suspicious way or a cat that has one paw too many for a few seconds), but overall the result is impressive.
Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W
Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf
— OpenAI (@OpenAI) February 15, 2024
Unlike ChatGPT, Sora is not currently available for everyone. Only “red teamers” who are already evaluating the model for potential risks (misinformation, hateful content and prejudice) and a small number of visual artists, designers and directors who will then have to share their feedback with the agency.
As already mentioned, Sora can make mistakes, and the company is also aware of this. In fact, on his website he writes that the model “may have difficulty accurately simulating the physics of a complex scene and may not understand specific instances of cause and effect. For example, a person may take a bite of a cookie, but the cookie may not later show a bite mark.“.