avatar

LLMRTC is an open-source TypeScript SDK for building real-time voice + vision AI agents in the browser using WebRTC. It’s meant to replace the usual “WebRTC + STT + LLM + TTS + custom glue” stack with a single, provider-agnostic orchestration layer.

What you get:

  • Sub-second, bidirectional audio/video over WebRTC
  • Server-side VAD + barge-in (users can interrupt mid-speech, like a real convo)
  • Provider-agnostic pipeline (mix & match OpenAI / Anthropic / Gemini / Bedrock / local models; e.g. Claude for LLM + Whisper for STT + ElevenLabs for TTS)
  • Tool calling via JSON Schema (model calls tools → you execute → convo continues)
  • Playbooks for multi-step flows (prompts/tools per stage + controlled transitions)
  • Streaming STT → LLM → TTS so audio starts playing before generation is fully done
  • Hooks/metrics + auto-reconnect for production resilience
  • Local-only mode (Ollama + Faster-Whisper + Piper) if you want fully self-hosted privacy

Quickstart + examples: Docs: https://www.llmrtc.org/getting-started/overview
GitHub: https://github.com/llmrtc/llmrtc

Would love feedback (latency, reconnect edge-cases, and what providers you’d like to see next).

👍 1
Login to comment.