AI and VoIP Blog

VOIP | AI | Cloud | Kamailio | Open Source


Sesame CSM Gradio UI – Free, Local, High-Quality Text-to-Speech with Voice Cloning! (CUDA, Apple MLX and CPU)

sesame csm gradio ui

What is Sesame CSM?

Conversational Speech Model (CSM) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

I just released Sesame CSM gradio UI, a 100% local, free text-to-speech tool with superior voice cloning! No cloud processing, no API keys – just pure, high-quality AI-generated speech on your own machine. It works on CUDA, APPLE MLX and CPU so anyone can try it.

Listen to a sample conversation generated by CSM.

🔥 Features:

  • Runs 100% Locally – No internet connection required!
  • Free & Open Source – No subscriptions, no paywalls.
  • Superior Voice Cloning – Built directly into the UI.
  • Gradio UI – Simple, interactive, and user-friendly.
  • Supports CUDA, Apple MLX, and CPU – Works on NVIDIA GPUs, Apple Silicon, and regular CPUs.

    Below is a video showing how to use voice cloning feature.
    Note: It has no audio, it shows how to use the UI.

Getting Started

1. Clone the Repository

 git clone https://github.com/akashjss/sesame-csm.git
 cd sesame-csm

2. Install Dependencies, use venv to isolate environment as shown below.

python -m venv venv
source .venv/bin/activate

pip install -r requirements.txt

3. Run Sesame CSM

python run_csm_gradio.py

Once the server is running, open the Gradio UI in your browser to start generating speech!

🎙️ How to Use Voice Cloning

One of the most exciting features of Sesame CSM is its built-in voice cloning. You can record your own voice and use it to generate AI speech.

Steps to Clone Your Voice:

  1. Click the microphone icon in the UI.
  2. Press the record button and read the Speaker Prompt.
  3. Stop recording when finished.
  4. Click ‘Generate Conversation’ to create AI-generated speech using your recorded voice.

Here’s a visual guide to help you out:

sesame csm gradio ui

💡 Why Use Sesame CSM?

If you’re looking for a fast, free, and high-quality text-to-speech tool with voice cloning, Sesame CSM is the perfect choice. Whether you’re a developer, content creator, or just experimenting with AI-generated speech, this tool gives you full control without any restrictions.

🔗 Try it Now!

👉 GitHub Repository

I’d love to hear your thoughts! Try it out and feel free to share your feedback, report issues, or contribute to the project!

Leave a Reply

Join 48 other subscribers

Akash Gupta
Senior VoIP Engineer and AI Enthusiast



Discover more from AI and VoIP Blog

Subscribe to get the latest posts sent to your email.



Leave a Reply

Discover more from AI and VoIP Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading