Deploying FLUX.1-dev on Serverless GPUs
Deploy serverless GPU applications on your AWS account
FLUX.1 [dev] is a 12 billion parameter rectified flow transformer that generates images from text. In this guide we will show you how to deploy the FLUX.1-dev model on your cloud account using Tensorfuse. We will be using 1 L40S GPU for this model.
We will use nvidia triton server to serve the model. We will also add token-based authentication to our service. We will store the authentication token (FLUX_API_KEY
) as a Tensorfuse secret.
Prerequisites
Before you begin, ensure you have configured Tensorfuse on your AWS account. If you haven’t done that yet, follow the Getting Started guide.
Deploying FLUX.1-dev with Tensorfuse
Each Tensorkube deployment requires:
- Your environment (as a Dockerfile).
- Your code (in this example, the models directory).
- A deployment configuration (
deployment.yaml
).
Step 1: Prepare the Dockerfile
We will use the official nvidia triton server image as our base image. This image comes with all the necessary dependencies to run the model. The image tag can be found in nvidia container catalog
Additional to base image, we will install couple of python packages, set additonal env and copy the models directory into docker image.
We’ve configured the triton server with couple of CLI flags tailored to our specific use case. We have disable metrics and have added authentication key for inference requets. For more details on authentication, refer to triton docs .If you have questions about selecting flags for production, reach out to the Tensorfuse Community
Step 2: Prepare the models directory
We will use python backend for tritonserver to serve the model. We will create a models directory and add the model.py and config.pbtxt file in it. For more details about triton python backend refer to triton docs
Step 3: Create Secrets
We will create a secret to store the authentication token. We will use this token to authenticate the inference requests.
we also need to create a hugging face secret to download model from huggingface hub
Step 4: Deployment config
Although you can deploy tensorfuse apps using command line, it is always recommended to have a config file so that you can follow a GitOps approach to deployment.
Now you can deploy your service using the following command:
Step 4: Accessing the deployed app
Voila! Your autoscaling production text to image service using flux.1-dev is ready.
Once the deployment is successful, you can see the status of your app by running:
And that’s it! You have successfully deployed the flux.1-dev model.
Remember to configure a TLS endpoint with a custom domain before going to production.
To test it out, we have a sample client.py python file. Add your deployment url DEPLOYMENT_URL
in the code and set the FLUX_API_KEY
as environment variable before running the client.py file.
Dont forget to install the required python packages before running the client.py file
Once you run the client.py file, you will see a generated_image.png file in your directory. Thats it, you have successfully generated an image using flux.1-dev model.
To get started with Tensorfuse,Click here
You can also directly use the Tensorfuse GitHub repository for more details and updates on these Dockerfiles.