Voice Pipeline

KULVEX includes a full voice interface with speech-to-text (STT), text-to-speech (TTS), voice activity detection (VAD), and intent recognition.

Architecture

Microphone → VAD → STT → Intent Detection → Domain Agent / LLM → TTS → Speaker

KULVEX tries STT providers in priority order:

Priority	Provider	Where	Latency
1	mnemo:voice	GPU node (Whisper large-v3 CUDA)	~1-2s
2	Deepgram	Cloud API	~0.5-1s
3	Whisper CPU	Local (slow)	~5-10s

GPU-accelerated Whisper running on a dedicated node. Configure the node in the KULVEX dashboard under Settings > Nodes.

Cloud STT with excellent accuracy. Set DEEPGRAM_API_KEY in Settings.

Local fallback using OpenAI’s Whisper model on CPU. Slow but works offline.

Default: EdgeTTS (Microsoft Edge voices, free, no API key).

Alternative: Piper (fully local, open-source voices).

Before sending to the LLM, KULVEX checks 12 regex patterns for common voice intents:

Matched intents use template responses (zero LLM, instant) or domain agents with minimal synthesis.

KULVEX remembers voice conversations: