Deploying SpeechT5 on serverless GPUs
Deploy serverless GPU applications on your AWS account
Built with developer experience in mind, Tensorkube simplifies the process of deploying serverless GPU apps. In this guide, we will walk you through the process of deploying SpeechT5 on it.
Prerequisites
Before you begin, ensure you have the configured Tensorkube on your AWS account. If you haven’t done that yet, follow the Getting Started guide.
Deploying SpeechT5 on Tensorfuse
Each tensorkube deployment requires two things - your code and your environment (as a Dockerfile). While deploying machine learning models, it is beneficial if your model is also a part of your container image. This reduces cold-start times by a significant margin.
Code files
We will write a small FastAPI app that loads the model and serves predictions. The FastAPI app will have two endpoints - /readiness
and /tts
. Remember that the /readiness
endpoint is used by Tensorkube to check the health of your deployments.
Environment files (Dockerfile)
Next, create your requirements.txt
file
And finally, a Dockerfile for your FastAPI app. Given below is a simple Dockerfile that you can use:
Deploying the app
SpeechT5 is now ready to be deployed on Tensorkube. Navigate to your project root and run the following command:
Speech T5 is now deployed on your AWS account. You can access your app at the URL provided in the output or using the following command:
followed by
And that’s it! You have successfully deployed SpeechT5 on serverless GPUs using Tensorkube. 🚀