Deploy serverless GPU applications on your AWS account
Built with developer experience in mind, Tensorkube simplifies the process of deploying serverless GPU apps. In this guide,
we will walk you through the process of deploying SpeechT5 on it.
Each tensorkube deployment requires two things - your code and your environment (as a Dockerfile).
While deploying machine learning models, it is beneficial if your model is also a part of your container image. This reduces cold-start times by a significant margin.
We will write a small FastAPI app that loads the model and serves predictions. The FastAPI app will have two endpoints - /readiness and /tts. Remember that the /readiness endpoint is used by Tensorkube to check the health of your deployments.