← all repositories
Lynpoint/CyberVerse

Self-hosted voice agents that can see, remember, and lip-sync

CyberVerse wires WebRTC, RAG, and optional real-time avatar video into a modular stack for building persistent AI companions.

CyberVerse
Velocity · 7d
+22
★ / day
Trend
steady
star history

What it does CyberVerse is a self-hosted framework for real-time, voice-first AI agents. It handles low-latency conversation over WebRTC (P2P or LiveKit SFU), persists character memory to disk, supports RAG over imported documents, and can optionally generate real-time digital-human video with lip-sync from a single reference photo. The stack runs as three services: a Python inference server, a Go API server, and a frontend.

The interesting bit The architecture splits “foreground” conversation flow from “background” work. A PersonaAgent keeps voice turns responsive and interruptible, while SubAgents handle slow tasks like research or report generation asynchronously. This prevents the awkward pause-while-thinking problem that kills immersion in voice interfaces.

Key highlights

  • Voice mode works without any local GPU; flip inference.avatar.enabled to false and it streams audio only
  • Supports visual input from user camera or screen share in standard/omni sessions
  • Modular “brain, voice, hearing, tools, memory, face” stack swappable via YAML config and web UI at /settings
  • Currently wires in Alibaba Qwen or Volcengine Doubao models and voice APIs
  • Avatar backends: FlashHead (1.3B weights) or LiveAct, with vllm support for the latter

Caveats

  • Setup is involved: Node 18+, Go 1.25, Conda, Python 3.10, FFmpeg, plus API keys for Chinese cloud providers (DashScope or Doubao)
  • Avatar mode needs CUDA 12.8, PyTorch 2.8, and manual model weight downloads from Hugging Face or ModelScope
  • README demos are example characters, not bundled with the project

Verdict Worth a look if you’re building persistent voice companions or digital-human interfaces and want full control over the pipeline. Skip it if you need a one-click SaaS or primarily English-centric TTS/ASR with no interest in Chinese model ecosystems.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.