At Lokutor, we believe that the future of human-computer interaction is voice-first. However, building a voice agent that feels natural and responsive is a significant engineering challenge. It requires orchestrating complex streams of audio data, managing state across multiple AI providers, and handling human nuances like interruptions and silence.
To help the developer community build faster, we are excited to announce the open-source release of the Lokutor Orchestrator for Go.
Why Go for Voice AI?
When we started building Lokutor, we chose Go for its exceptional handling of concurrency and its robust standard library. Voice orchestration is inherently parallel: you need to record audio, stream it to a Speech-to-Text (STT) provider, process text through a Large Language Model (LLM), and stream synthesized audio from a Text-to-Speech (TTS) engine—all at the same time.
Go’s goroutines and channels are perfect for this “plumbing,” allowing us to maintain sub-100ms internal latency even under heavy load.
Key Features of Lokutor Orchestrator
The orchestrator isn’t just a wrapper; it’s a production-ready framework for voice agents.
1. Full-Duplex Voice Orchestration (v1.3)
The library supports simultaneous capture and playback. This means your agent can “listen” while it’s “speaking,” mimicking the natural flow of human conversation.
2. Built-in Voice Activity Detection (VAD)
We’ve implemented a thread-safe, RMS-based VAD that automatically detects when a user begins and stops speaking. This triggers the transition between “Listening” and “Thinking” states without requiring complex client-side logic.
3. Native Barge-in Support
One of the most difficult features to implement is “Barge-in”—the ability for a user to interrupt the bot. The Lokutor Orchestrator manages this state automatically. When the user starts speaking, the orchestrator emits an interruption event, allowing you to instantly clear audio buffers and reset the LLM context.
4. Provider-Agnostic Architecture
We built the library with flexibility in mind. You can swap providers for every stage of the pipeline:
- LLM: Groq (Llama), OpenAI (GPT), Anthropic (Claude), Google (Gemini)
- STT: Groq (Whisper), OpenAI (Whisper), Deepgram, AssemblyAI
- TTS: Lokutor (Versa)
Quick Start: Building a Voice Agent
Building a full-duplex agent with barge-in support is now as simple as a few lines of Go code:
package main
import (
"context"
"github.com/lokutor-ai/lokutor-orchestrator/pkg/orchestrator"
sttProvider "github.com/lokutor-ai/lokutor-orchestrator/pkg/providers/stt"
llmProvider "github.com/lokutor-ai/lokutor-orchestrator/pkg/providers/llm"
ttsProvider "github.com/lokutor-ai/lokutor-orchestrator/pkg/providers/tts"
)
func main() {
// 1. Initialize your preferred providers
stt := sttProvider.NewGroqSTT("YOUR_GROQ_KEY", "whisper-large-v3")
llm := llmProvider.NewGroqLLM("YOUR_GROQ_KEY", "llama-3.3-70b-versatile")
tts := ttsProvider.NewLokutorTTS("YOUR_LOKUTOR_KEY")
// 2. Setup VAD and Orchestrator
vad := orchestrator.NewRMSVAD(0.04, 600*time.Millisecond)
orch := orchestrator.NewWithVAD(stt, llm, tts, vad, orchestrator.DefaultConfig())
// 3. Start a managed stream
stream := orch.NewManagedStream(context.Background(), orch.NewSession("user_1"))
// 4. Handle voice events
go func() {
for event := range stream.Events() {
if event.Type == orchestrator.AudioChunk {
playAudio(event.Data.([]byte))
}
}
}()
// 5. Pipe mic audio to the stream
stream.Write(microphoneBytes)
}
Join the Revolution
The Lokutor Orchestrator is licensed under the MIT License and is available today on GitHub. Whether you’re building a customer support bot, a language learning assistant, or a new kind of interactive game character, we can’t wait to see what you build.
Check out the repository and start building: github.com/lokutor-ai/lokutor-orchestrator