Generate videos by adding speech to images or videos
Audio Conditioned LipSync with Latent Diffusion Models
Fixed fork of the original audio sr!
Generate realistic audio from text input
Speech Enhancement Gradio Demo
Generate spatial audio from images (and optionally text)
Select the more realistic video from pairs
Create photorealistic 3D portraits from your videos
Generate and sync sound effects for an uploaded video
Audio Gen, Audio Style Transfer and Audio InPainting
Generate lip-synced video from audio and image/video
Generate audio from text using a custom voice
Demo for Generative Photography
sutra-avatar-v2 is an AI-powered tool designed to add realistic sound to videos. It allows users to generate videos by adding speech to images or videos, creating a more immersive and engaging experience.
• Realistic Sound Generation: Adds lifelike audio to videos, enhancing the visual content.
• Speech-to-Video Synthesis: Converts text into natural-sounding speech and integrates it seamlessly into videos.
• Customization Options: Supports various voice styles, tones, and languages.
• Compatibility: Works with diverse video and image formats for flexible use.
What file formats does sutra-avatar-v2 support?
sutra-avatar-v2 supports major video and image formats, including MP4, AVI, JPG, and PNG.
Can I customize the voice or tone of the generated speech?
Yes, sutra-avatar-v2 offers options to choose from multiple voices, tones, and languages for a personalized experience.
Why doesn't the generated audio sync with my video?
Ensure your video and text inputs are aligned correctly. Adjust timing settings or re-sync the audio if necessary.