Deploy serverless GPU applications on your AWS account
FLUX.1 [dev] is a 12 billion parameter rectified flow transformer that generates images from text. In this guide we will show you how to
deploy the FLUX.1-dev model on your cloud account using Tensorfuse. We will be using 1 L40S GPU for this model.
We will use nvidia triton server to serve the model. We will also add token-based authentication to our service. We will store the authentication token (FLUX_API_KEY) as a Tensorfuse secret.
We will use the official nvidia triton server image as our base image. This image comes with all the necessary
dependencies to run the model. The image tag can be found in nvidia container catalog
Additional to base image, we will install couple of python packages, set additonal env and copy the models directory into docker image.
We’ve configured the triton server with couple of CLI flags tailored to our specific use case. We have disable metrics and have added authentication key for inference requets. For more details on authentication, refer to triton docs
.If you have questions about selecting flags for production, reach out to the Tensorfuse Community
We will use python backend for tritonserver to serve the model. We will create a models directory and add the model.py and config.pbtxt file in it. For more details about triton python backend refer to triton docs
Although you can deploy tensorfuse apps using command line, it is always recommended to have a config file so
that you can follow a GitOps approach to deployment.
deployment.yaml
# deployment.yaml for FLUX.1-devgpus: 1 # Number of GPUsgpu_type: l40s # GPU Typeport: 8000 # Port to expose the servicemin_scale: 0max_scale: 1secret: - hugging-face-secret - flux-secretreadiness: httpGet: path: /v2/health/ready # readiness endpoint for triton server port: 8000
Now you can deploy your service using the following command:
Voila! Your autoscaling production text to image service using flux.1-dev is ready.
Once the deployment is successful, you can see the status of your app by running:
tensorkube deployment list
And that’s it! You have successfully deployed the flux.1-dev model.
Remember to configure a TLS endpoint with a custom domain before going to production.
To test it out, we have a sample client.py python file. Add your deployment url DEPLOYMENT_URL in the code and set the FLUX_API_KEY as environment variable before running the client.py file.
client.py
import requestsimport jsonfrom io import BytesIOfrom PIL import Imageimport numpy as npimport osdeployment_url = "<DEPLOYMENT_URL>" # replace with your deployment url, remove trailing slashapi_key = os.getenv("FLUX_API_KEY")inference_endpoint = f"{deployment_url}/v2/models/flux/versions/1/infer"request_data = { "inputs": [ { "name": "PROMPT", "shape": [1], "datatype": "BYTES", "data": ["Generate a golden retriever with a sunset background"] } ]}headers = {"Content-Type": "application/json", "API_KEY": api_key}# Send POST requestresponse = requests.post(inference_endpoint, headers=headers, json=request_data)if response.status_code != 200: print(f"Failed to send request to {inference_endpoint}") print(f"Response: {response.text}") exit()response_data = response.json()image_data = response_data["outputs"][0]["data"]img_np = np.array(image_data, dtype=np.uint8)byte_data = img_np.tobytes()# Wrap the bytes in a BytesIO streambyte_io = BytesIO(byte_data)# Save the generated imagegenerated_image = Image.open(byte_io)generated_image.save("generated_image.png")
Dont forget to install the required python packages before running the client.py file
pillownumpyrequests
pip install -r requirements.txt
python client.py
Once you run the client.py file, you will see a generated_image.png file in your directory. Thats it, you have successfully generated an image using flux.1-dev model.