Moshi AI is Kyutai's pioneering real-time voice AI model that engages in fully natural spoken conversation with extremely low latency — speaking and listening simultaneously, just like a human. Unlike text-based AI with voice interfaces, Moshi processes and generates speech natively, enabling genuine real-time dialogue with natural interruptions, emotional expression, and conversational pacing. As an open research release, Moshi represents a significant advance in voice AI that demonstrates the feasibility of real-time, full-duplex AI conversation that feels fundamentally different from the turn-based voice interfaces of current AI assistants.