Skip to main content

AI Video Glossary

Clear definitions of the terms you'll encounter when creating AI-powered talking head videos. From lip sync to voice cloning — we explain it all.

Lip Sync / Lip Syncing

The process of matching mouth movements to audio speech. In AI video, lip sync algorithms analyze audio waveforms and generate realistic mouth shapes frame-by-frame. Puppetry uses LivePortrait + Wav2Lip for production-quality lip sync across 29 languages.

Voice Cloning

Creating a synthetic replica of a specific person's voice using AI. Users record a short sample (30 seconds to 5 minutes), and the AI learns to reproduce their speech patterns, tone, and accent. Used for personalized video content.

AI Puppet

A still image (photo, illustration, or 3D render) that can be animated to speak using AI. Unlike traditional puppets, AI puppets require no physical manipulation — you upload a photo and the AI handles lip sync, head movement, and expressions.

LivePortrait

An open-source AI model for portrait animation. It generates natural head movements, facial expressions, and eye blinks from a single photo. Combined with Wav2Lip for lip sync, it forms the core of Puppetry's animation pipeline.

Wav2Lip

A neural network that generates accurate lip movements from audio input. It takes a face image and audio waveform, then produces video frames with perfectly synced mouth movements. Known for high accuracy across languages and accents.

Text-to-Video

The process of generating video content from text input. In Puppetry, this means typing a script, selecting a voice, and getting a fully animated talking head video — no camera, studio, or editing skills needed.

Photo-to-Video

Converting a static photograph into an animated video. AI analyzes facial features in the photo and generates realistic motion including lip sync, head turns, and expressions. Works with real photos, illustrations, and 3D renders.

Neural Voice

A synthetic voice generated by deep neural networks (as opposed to older concatenative TTS). Neural voices sound significantly more natural, with proper intonation, breathing, and emotional range. ElevenLabs and OpenAI are leading providers.

Deepfake vs AI Avatar

Deepfakes impersonate real people without consent, often for deception. AI avatars are created from photos with owner permission for legitimate purposes (education, marketing, accessibility). Puppetry is designed for ethical AI avatar creation — users animate their own photos or stock characters.

Ready to create your first AI video?

Upload a photo, type a script, pick a voice. Your talking head video is ready in under 90 seconds.

🎭 Try Puppetry Free →