The CSM (Conversational Speech Model) by Sesame is a speech generation model designed to generate RVQ audio codes using text and audio inputs. It utilizes a Llama backbone for its architecture, along with a compact audio decoder that outputs Mimi audio codes. Follow this guide to deploy the Sesame-CSM-1B model on your cloud account using Tensorfuse. We will be using 1 A10G GPU for this model. We will use nvidia triton server to serve the model.Documentation Index
Fetch the complete documentation index at: https://tensorfuse.io/docs/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Before you begin, ensure you have configured Tensorfuse on your AWS account. If you haven’t done that yet, follow the Getting Started guide. You will also need access to the Sesame CSM 1B and the Llama-3.2-1B gated models on huggingface.Deploying Sesame CSM 1B with Tensorfuse
Each Tensorkube deployment requires:- Your environment (as a Dockerfile).
- Your code (in this example, the models directory).
- A deployment configuration (
deployment.yaml).
Step 1: Prepare the Dockerfile
We will use the official nvidia triton server image as our base image. This image comes with all the necessary dependencies to run the model. The image tag can be found in nvidia container catalog We clone the CSM github repository to make deploying the model easier. We will also need thehf-transfer and numpy python
packages.
Dockerfile
Step 2: Prepare the models directory
We will use a python backend for tritonserver to serve the model. We will create amodel_repository directory and add the model.py and config.pbtxt file in it. For more details about triton python backend refer to triton docs
model_repository/csm_1b/1/model.py
model_repository/csm_1b/config.pbtxt
Step 3: Create Secrets
We need to create a hugging face secret to download model from huggingface hubStep 4: Deployment config
Although you can deploy tensorfuse apps using command line, it is always recommended to have a config file so that you can follow a GitOps approach to deployment.config.yaml
Step 4: Accessing the deployed app
Voila! Your autoscaling production text to speech service using sesame-csm-1b is ready. Once the deployment is successful, you can see the status of your app by running:Remember to configure a TLS endpoint with a custom domain before going to production.
Testing the model
To test it out, we have a sample streamlit_app.py python file. Add your deployment urlDEPLOYMENT_URL in the code before running the streamlit_app.py file using the command streamlit run streamlit_app.py
streamlit_app.py

