View Source Code
Browse the complete example on GitHub
Whatโs Inside?
The demo provides three primary capabilities powered by LFM2.5-Audio-1.5B:- ASR (Automatic Speech Recognition): Convert spoken audio into accurate text transcriptions
- TTS (Text-to-Speech): Transform written text into natural-sounding audio output
- Interleaved Mode: Enable mixed conversations combining both audio and text inputs
Quick Start
-
Clone the repository
-
Verify you have npm installed on your system
-
Install dependencies
-
Start the development server
-
Access the application at
http://localhost:5173in your browser
Understanding the Architecture
This demo uses the LFM2.5-Audio-1.5B model, a 1.5 billion parameter audio model that handles both speech recognition and speech synthesis. The model has been quantized and converted to ONNX format for efficient browser-based inference.Model Architecture
The implementation uses quantized ONNX models sourced from theLiquidAI/LFM2.5-Audio-1.5B-ONNX repository on Hugging Face. These models are optimized to run with WebGPU acceleration, providing fast inference directly in the browser.
Three Operation Modes
1. Automatic Speech Recognition (ASR)- Input: Audio file or microphone recording
- Output: Text transcription
- Use case: Transcribe meetings, lectures, or voice notes
- Input: Written text
- Output: Natural-sounding audio
- Use case: Create voice assistants, audiobooks, or accessibility features
- Input: Mixed audio and text
- Output: Conversational responses in text or audio
- Use case: Interactive voice assistants and chatbots
System Requirements
WebGPU Support RequiredThis demo requires a modern web browser with WebGPU support:
- Chrome 113 or later (recommended)
- Edge 113 or later
- Chrome:
chrome://flags/#enable-unsafe-webgpu - Edge:
edge://flags/#enable-unsafe-webgpu
Model Licensing
LFM 1.0 LicenseThe model weights are distributed under the LFM 1.0 License. For complete licensing details, refer to the official Hugging Face repository.
Build for Production
To create an optimized production build:dist/ directory, ready for deployment to any web server.
Further Improvements
Potential enhancements for this demo:- Streaming Inference: Real-time processing for longer audio inputs
- Voice Customization: Add controls for pitch, speed, and voice characteristics in TTS mode
- Noise Reduction: Integrate preprocessing to improve ASR accuracy in noisy environments
- Batch Processing: Support for processing multiple audio files simultaneously
- Model Caching: Optimize initial load time with better caching strategies