Loading...
Loading...
Production-ready voice pipeline with backend orchestration plus optional STT/LLM/TTS microservices. Supports Whisper-based STT with VAD, multiple TTS engines (Kokoro, Orpheus 3B, Coqui XTTS), and flexible LLM backends via vLLM (OpenAI-compatible) or Ollama. WebSocket streaming handles token/audio flow, turn detection/barge-in, and conversation management. A Next.js 16 frontend provides a voice-first dashboard with collapsible chat/metrics panels, live connection state, and color-coded latency thresholds. Deployable on GPU, CPU, or Apple Silicon with health and metrics endpoints, and each microservice can be consumed independently by other applications.
Whisper-based speech-to-text with VAD, exposed over WebSocket/HTTP, with CUDA, CPU, and Apple Silicon profiles.
Async WebSocket handlers coordinate STT/LLM/TTS streams with structured payloads, VAD-based turn control, and configurable worker pools per hardware target. Docker Compose profiles cover monolith and remote services; health/metrics endpoints and model selection toggles keep deployments observable and hardware-aware (CUDA, CPU, MPS).
Delivers a low-latency voice pipeline that runs locally or as microservices, supports multiple STT/TTS stacks with vLLM (OpenAI-compatible) or Ollama LLM backends, meets sub-2s first-token and sub-1s TTS targets across GPU, CPU, or Apple Silicon, and now ships with a WebSocket-driven frontend showing live chat, connection state, and latency benchmarks.