🤖 Multilingual Generative AI Avatar: Local and Cloud-based Real-Time Deployments

📖 About

This repository presents a lightweight, multilingual avatar system for real-time Human-AI interaction in Kazakh, Russian, and English. We compare two deployment architectures developed at ISSAI:

Local: Uses quantized Qolda model (4.3B parameters), Whisper Turbo ASR, and Matcha-TTS Cloud-based: Uses Oylan LLM and MangiSoz APIs

📦 Reproducing the Work

For Local Deployment: Set up the three backend services and Avatar UI: 1. Deploy Qolda LLM using llama.cpp by downloading the quantized GGUF model from HuggingFace (issai/Qolda or issai/Qolda_GGUF) and running it with llama.cpp's server mode on port 8080 to provide an OpenAI-compatible API endpoint 2. Deploy the ASR service using faster-whisper (https://github.com/SYSTRAN/faster-whisper) conversion, wrapping it in a FastAPI backend that mimics OpenAI's Whisper API format; 3. Deploy the TTS service using ISSAI's TTS(issai/tts in HF, it is accessible with only by permission of ISSAI) wrapped in a custom FastAPI backend, which requires no optimization as it's already fast. 4. Clone the Avatar UI repository(https://github.com/IS2AI/lightweight_avatar), run npm install to install dependencies, and start the development server with npm run dev - the Vite proxy will automatically route /api, /tts-api, and /asr-api requests to your local services.

For Cloud Deployment: Instead of running local services, click the "API Settings" button in the avatar UI, switch to "Cloud Deployment" mode, and configure the three API endpoints: for LLM, use either a cloud-hosted Qolda instance or alternatives like Oylan API with your API key; for ASR, use MangiSoz API which provides faster-whisper as a service; for TTS, use cloud-hosted MangiSoz API TTS service or alternatives that are OpenAI-compatible API formats. The configuration is saved to localStorage and persists across sessions, enabling seamless switching between local and cloud deployments without code changes.

Key Results:

Local deployment is 62% faster (2.20s vs 5.74s end-to-end latency)
LLM inference: 76% faster locally (0.99s vs 4.11s)
ASR: 38% faster locally
Avatar rendering uses only 15-20% GPU at 60 FPS
On-device models enable responsive, offline multilingual interaction

✨ Features

Core Capabilities

Multilingual Support: Kazakh, Russian, and English language processing
Dual Deployment Architectures: Cloud-based and local deployment options
Real-time Human-AI Interaction: Low-latency conversational interface
3D Avatar Interface: Ready Player Me-based avatar rendering at 60 FPS
Speech Processing Pipeline: End-to-end ASR, LLM inference, and TTS synthesis

🎬 Demo

▶️ Watch the video demonstration to see the system in action.

🏗️ System Pipeline

The following diagram illustrates the complete system architecture comparing cloud-based and local deployment approaches:

🚀 Quick Start

Prerequisites

Node.js (v16 or higher)
npm or yarn package manager
Modern browser with microphone access support

Installation

Clone the repository:

git clone https://github.com/IS2AI/lightweight_avatar
cd lightweight_avatar

Install dependencies:

npm install

Configure API endpoints (see Deployment sections below)
Start development server:

npm run dev

Open in browser: Navigate to http://localhost:5173/

Note: For full functionality, you need to set up backend services (LLM, TTS, ASR) as described in the "Reproducing the Work" section above, or configure cloud API endpoints using the "API Settings" button in the UI.

🎮 Usage Guide

🎤 Voice Interaction

Click the microphone button to start voice recognition
Speak your question to the AI educator
Stop speaking - automatic 2-second countdown begins
Message sends automatically - no button clicking needed!
AI responds - microphone auto-pauses during response
Auto-resumes after AI finishes for seamless conversation

📁 Project Structure

src/
├── components/
│   ├── LandingPage.jsx          # Beautiful landing page
│   ├── ApiConfigModal.jsx       # API configuration modal
│   ├── ClassroomPage.jsx        # Main classroom wrapper
│   ├── ClassroomUI.jsx          # Zoom-like interface
│   ├── ClassroomExperience.jsx  # 3D classroom environment
│   ├── VoiceRecognition.jsx     # Voice control component
│   ├── Avatar.jsx               # AI educator 3D model
│   ├── Experience.jsx           # Original 3D scene
│   └── UI.jsx                   # Original UI (legacy)
├── contexts/
│   └── ApiConfigContext.jsx     # API configuration state management
├── hooks/
│   ├── useChat.jsx              # AI conversation management (LLM + TTS)
│   ├── useMangiSozSTT.jsx       # ASR integration
│   └── useVoiceRecognition.jsx  # Legacy voice recognition (deprecated)
├── utils/
│   └── audioConverter.js        # Audio format conversion utilities
├── assets/                      # 3D models, textures, environments
├── App.jsx                      # Main app with routing
├── main.jsx                     # App entry point
└── index.css                    # Global styles + animations

🎨 Customization

Classroom Themes

Modify src/components/ClassroomExperience.jsx to add new environments:

Change lighting presets
Add new 3D models
Customize classroom layout

AI Educator Personality

Update src/hooks/useChat.jsx to modify:

Educational context
Subject specialization
Response style
Learning level

Voice Settings

Adjust src/hooks/useVoiceRecognition.jsx for:

Silence detection timing (default: 2 seconds)
Language settings
Audio sensitivity

🏭 Production Build

Build for Production:

npm run build

Preview Production Build:

npm run preview

The production build will be generated in the dist/ directory and can be deployed to any static hosting service (Vercel, Netlify, GitHub Pages, etc.).

📄 License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

You are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material

Under the following terms:

Attribution — You must give appropriate credit to ISSAI
NonCommercial — You may not use the material for commercial purposes

For more details, see the CC BY-NC 4.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
public		public
src		src
.gitignore		.gitignore
Colored_merged_pipeline.drawio.svg		Colored_merged_pipeline.drawio.svg
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
vite.config.js		vite.config.js
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Multilingual Generative AI Avatar: Local and Cloud-based Real-Time Deployments

📖 About

📦 Reproducing the Work

✨ Features

Core Capabilities

🎬 Demo

🏗️ System Pipeline

🚀 Quick Start

Prerequisites

Installation

🎮 Usage Guide

🎤 Voice Interaction

📁 Project Structure

🎨 Customization

Classroom Themes

AI Educator Personality

Voice Settings

🏭 Production Build

Build for Production:

Preview Production Build:

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Multilingual Generative AI Avatar: Local and Cloud-based Real-Time Deployments

📖 About

📦 Reproducing the Work

✨ Features

Core Capabilities

🎬 Demo

🏗️ System Pipeline

🚀 Quick Start

Prerequisites

Installation

🎮 Usage Guide

🎤 Voice Interaction

📁 Project Structure

🎨 Customization

Classroom Themes

AI Educator Personality

Voice Settings

🏭 Production Build

Build for Production:

Preview Production Build:

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages