Prerequisites
Before you begin, ensure you have configured Tensorkube on your AWS account. If you haven’t done that yet, follow the Getting Started guide.Deploying Pixtral-12B with Tensorfuse
Each tensorkube deployment requires two things - your code and your environment (as a Dockerfile). While deploying machine learning models, it is beneficial if your model is also a part of your container image. This reduces cold-start times by a significant margin. To enable this, in addition to a FastAPI app and a dockerfile, we will also write code to download the model and place it in our image file.Download the model
We will write a small script that downloads the Pixtral model from the Hugging Face model hub and saves it in the/models
directory.
download_model.py
Code files
We will write a small FastAPI app that loads the model and serves predictions. The FastAPI app will have three endpoints -/readiness
, /
, and /generate
. Remember that the /readiness
endpoint is used by Tensorkube to check the health of your deployments.
main.py
Environment files (Dockerfile)
Next, create a Dockerfile for your FastAPI app. Given below is a simple Dockerfile that you can use:Dockerfile