OpenAI introduces Sora, its AI model for generating video from text

OpenAI has unveiled its latest AI model, Sora, which uses text to create videos. This new model from the creators of ChatGPT can generate videos in multiple resolutions and aspect ratios, as well as edit existing videos based on a text prompt. Sora can also create videos from a still image and fill in missing frames to extend existing videos.

Sora is currently capable of producing up to a minute of Full HD video content, and the examples OpenAI has shared look promising. More examples can be found on Sora’s landing page.

Sora’s ability to generate complex scenes with multiple characters, specific motion types, and accurate details of subject and background is impressive. The model not only understands the user’s prompt, but also how those elements exist in the real world.

Sora uses a transformer architecture similar to ChatGPT, treating videos and images as smaller data units called patches. Videos created by Sora start as static noise, with the model gradually removing noise to form the final product.

Noisy input patches transformed to high quality video

OpenAI has emphasized that it is applying existing safety protocols used in DALL·E 3 to test Sora before its official launch. The model is undergoing testing by “red teamers” experts to assess potential risks.

OpenAI is also planning to engage in discussions with policymakers, artists, and educators to address potential concerns and explore use cases for Sora. No official launch date has been announced yet.

Source