top of page

Meta has announced its AI video editing and creation models: Emu Edit and Emu Video

Building upon its foundational work in image generation with the Emu model, Meta has announced the launch of Emu Edit and Emu Video, groundbreaking AI models for image editing and video creation.

Emu Edit is an innovative image editing technology designed to simplify a wide range of image manipulation tasks, offering greater convenience and precision. It can perform various editing operations, including local and global edits, background removal or addition, color adjustments, and geometric transformations, all based on user instructions.

Emu Edit integrates computer vision tasks into an image generation model, offering more precise control during the image generation and editing processes. Researchers note that existing image editing models often over or under modify images, whereas Emu Edit can accurately execute editing tasks based on instructions.

For training Emu Edit, Meta utilized a massive dataset of 10 million synthetic samples, the largest of its kind. This dataset has significantly enhanced the model's capability to accurately execute instructions, surpassing existing research in image editing.

Emu Video introduces a simple and efficient method for converting text to video, based on diffusion models with Emu as the foundational architecture. This two-step video creation process first generates images from text prompts and then creates videos based on these images and text. Unlike previous studies requiring complex model sequences, Emu Video uses just two diffusion models to produce high-resolution, 16 fps, 4-second videos.

Human assessments prefer the outcomes of Emu Video, both in terms of video quality and fidelity to text prompts. Emu Video was favored by 96% of respondents over the Make-A-Video method for quality, and 85% preferred Emu Video for text prompt fidelity.

Additionally, Emu Video extends its capabilities beyond previous models by animating user-provided images based on text prompts. This feature not only enhances user interaction but also opens new avenues for animation production, allowing users to bring static images to life with unprecedented creativity and flexibility. This advancement marks a significant technological breakthrough in the field of image and video generation, heralding a new era of digital content creation.



bottom of page