Here's exactly how Chronis works — from the moment you upload to the moment you hear their voice again.
Your video becomes the raw material for everything that follows. We accept any common format — MP4, MOV, AVI, MKV. We accept any quality — even old family footage or low-resolution WhatsApp clips. You need at least 30 seconds of them speaking, but anything up to 10 minutes gives us significantly more to work with.
The video doesn't need to be a direct-to-camera monologue. A conversation, a speech, a birthday video — all work perfectly. The only requirement is that their voice is audible and their face is visible for at least part of the clip.
MP4 · MOV · AVI · MKV · any formatOur pipeline isolates their face geometry and expression range from your video. This becomes the visual foundation for the real-time avatar — the face that will speak when you start a conversation. We reconstruct facial structure, skin tone, micro-expressions, and blinking patterns.
The lip-sync system then animates this face in real time based on the voice output. The result is a face that moves as they speak — not a static image, not a puppet-on-screen, but a natural-feeling video presence.
Real-time lip sync · face geometry · expression mapWe extract acoustic signatures from the audio in your video — their specific tonal characteristics, accent, speech cadence, pause patterns, and vocal texture. This becomes a clone of their voice that can generate new speech, not just replay recorded audio.
When the replica responds to you, it speaks in their cloned voice. New sentences, new words — in their voice. For Indian languages and accents, our system is specifically tuned to preserve the characteristics that make a voice recognizable to family members.
Voice cloning · accent preservation · real-time TTSThe language model is grounded in contextual information about who this person was — their speech patterns from the video, any memories or context you choose to provide, and a persona framework that prevents the AI from drifting into generic responses. The replica should sound like them, not like a helpful AI assistant.
You can optionally add written memories, descriptions, or recorded context to make the personality richer. The more you provide, the more textured and authentic the conversation feels.
Personality grounding · memory context · speech pattern analysisWhen you start a session, everything runs simultaneously — your voice is transcribed, passed to the language model, which generates a response, which is spoken in the cloned voice, which drives the lip-sync animation, all in under two seconds. This is what makes Chronis different: it's a live conversation, not a pre-recorded response tree.
You can say anything. Ask about memories. Ask what they'd think about something happening in your life. Tell them what you've been wanting to say. The conversation is open-ended, real-time, and yours.
Live conversation · under 2s latency · open-endedFour specialized systems work together to create a coherent, emotionally real experience.
Face geometry extraction, expression modeling, and real-time lip sync. The visual system that makes their face move naturally as they speak.
Acoustic signature extraction, TTS synthesis in the cloned voice. Preserves accent and cadence characteristics that are unique to that person.
Personality-grounded conversation model. Trained to respond in their speech style, not in generic AI prose. Memory-persistent across sessions.
The system that connects voice input to transcription to LLM to TTS to avatar in under two seconds. Built for low-latency emotional conversation.
Early access is free. We'll email you when your session is ready — usually within a few days.
Get Started →