MaskGCT TTS Demo
Listen and respond to voice commands in Spanish
Better AI powered platform to purify your speech signal
Belarusian TTS
Talk to Qwen2Audio with Gradio and WebRTC ⚡️
Realtime implementation of Whisper large turbo
Generate high-quality speech from text with specified emotion and voice
F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
V1.0Convert any Ebook to AudioBook with Xtts + VoiceCloning!
Generate audio and SRT subtitles from text
Moonshine ASR models running on-device, in your web browser.
Generate audio from text
The MaskGCT TTS Demo is a cutting-edge speech synthesis tool that enables users to generate high-quality speech from text. Built on the MaskGCT model, it offers a unique approach to text-to-speech synthesis by utilizing audio prompts to guide the generation process. This demo provides an interactive platform to explore advanced voice synthesis capabilities.
• Text-to-Speech Generation: Convert written text into natural, human-like speech.
• Audio-Prompted Synthesis: Use audio prompts to fine-tune the output, enabling precise control over speech style and tone.
• High-Quality Output: Generate speech with high fidelity and accuracy.
• User-Friendly Interface: Easy-to-use interface designed for seamless interaction.
• Customizable Options: Adjust settings to achieve desired results for various applications.
What formats are supported for text input?
The MaskGCT TTS Demo supports plain text input in multiple languages. Please ensure that the text is in UTF-8 encoding for proper processing.
Can I use my own audio files as prompts?
Yes, you can upload your own audio files to serve as prompts for the synthesis process. Supported formats include WAV, MP3, and OGG.
How long does it take to generate the speech?
Generation time varies depending on the length of the input text and the complexity of the audio prompt. Typically, it takes a few seconds for short texts and up to a minute for longer inputs.