Introduction
This document provides a detailed guide on how to use your own audio file as a voice source to generate a digital avatar video via the Jogg.ai API. This feature allows the digital avatar model to perform lip-syncing to any voice you provide (e.g., your own recording).Core Concept: Audio-Driven Lip-Sync
Unlike using a preset voice (voice_id
), the core of this feature is “audio-first”. You provide an audio file containing speech, and our system will analyze this audio to drive the digital avatar’s mouth movements to precisely match the pronunciation in the audio.
- Asynchronous Processing: This process is also asynchronous. After submitting the task, you will immediately receive a
project_id
, and the video will be processed in the background. - Audio is Key: The voice of the character in the video will come entirely from the audio file you provide.
- Automatic Lip-Sync: The system automatically analyzes the audio waveform and phonemes to generate highly synchronized lip animations.