AI and VoIP Blog

VOIP | AI | Cloud | Kamailio | Open Source


Sesame CSM Gradio UI – Free, Local, High-Quality Text-to-Speech with Voice Cloning! (CUDA, Apple MLX and CPU)

sesame csm gradio ui

What is Sesame CSM?

Conversational Speech Model (CSM) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

I just released Sesame CSM gradio UI, a 100% local, free text-to-speech tool with superior voice cloning! No cloud processing, no API keys – just pure, high-quality AI-generated speech on your own machine. It works on CUDA, APPLE MLX and CPU so anyone can try it.

Listen to a sample conversation generated by CSM.

🔥 Features:

  • Runs 100% Locally – No internet connection required!
  • Free & Open Source – No subscriptions, no paywalls.
  • Superior Voice Cloning – Built directly into the UI.
  • Gradio UI – Simple, interactive, and user-friendly.
  • Supports CUDA, Apple MLX, and CPU – Works on NVIDIA GPUs, Apple Silicon, and regular CPUs.

    Below is a video showing how to use voice cloning feature.
    Note: It has no audio, it shows how to use the UI.

Getting Started

1. Clone the Repository

 git clone https://github.com/akashjss/sesame-csm.git
 cd sesame-csm

2. Install Dependencies, use venv to isolate environment as shown below.

python -m venv venv
source .venv/bin/activate

pip install -r requirements.txt

3. Run Sesame CSM

python run_csm_gradio.py

Once the server is running, open the Gradio UI in your browser to start generating speech!

🎙️ How to Use Voice Cloning

One of the most exciting features of Sesame CSM is its built-in voice cloning. You can record your own voice and use it to generate AI speech.

Steps to Clone Your Voice:

  1. Click the microphone icon in the UI.
  2. Press the record button and read the Speaker Prompt.
  3. Stop recording when finished.
  4. Click ‘Generate Conversation’ to create AI-generated speech using your recorded voice.

Here’s a visual guide to help you out:

sesame csm gradio ui

💡 Why Use Sesame CSM?

If you’re looking for a fast, free, and high-quality text-to-speech tool with voice cloning, Sesame CSM is the perfect choice. Whether you’re a developer, content creator, or just experimenting with AI-generated speech, this tool gives you full control without any restrictions.

🔗 Try it Now!

👉 GitHub Repository

I’d love to hear your thoughts! Try it out and feel free to share your feedback, report issues, or contribute to the project!

Join 753 other subscribers

Leave a comment

Akash Gupta
Senior VoIP Engineer and AI Enthusiast



Discover more from AI and VoIP Blog

Subscribe to get the latest posts sent to your email.



Leave a comment