LLMRTC: Build voice and vision AI agents (STT↔LLM↔TTS) in TypeScript with WebRTC show agents open source github.com

by james57 2 months ago

James

2 months ago

LLMRTC is an open-source TypeScript SDK for building real-time voice + vision AI agents in the browser using WebRTC. It’s meant to replace the usual “WebRTC + STT + LLM + TTS + custom glue” stack with a single, provider-agnostic orchestration layer.

What you get:

Sub-second, bidirectional audio/video over WebRTC
Server-side VAD + barge-in (users can interrupt mid-speech, like a real convo)
Provider-agnostic pipeline (mix & match OpenAI / Anthropic / Gemini / Bedrock / local models; e.g. Claude for LLM + Whisper for STT + ElevenLabs for TTS)
Tool calling via JSON Schema (model calls tools → you execute → convo continues)
Playbooks for multi-step flows (prompts/tools per stage + controlled transitions)
Streaming STT → LLM → TTS so audio starts playing before generation is fully done
Hooks/metrics + auto-reconnect for production resilience
Local-only mode (Ollama + Faster-Whisper + Piper) if you want fully self-hosted privacy

Quickstart + examples: Docs: https://www.llmrtc.org/getting-started/overview
GitHub: https://github.com/llmrtc/llmrtc

Would love feedback (latency, reconnect edge-cases, and what providers you’d like to see next).

👍 1