Deploying Stable Diffusion 3 Medium on Serverless GPUs
Deploy serverless GPU applications on your AWS account
Built with developer experience in mind, Tensorkube simplifies the process of deploying serverless GPU apps. In this guide,
we will walk you through the process of deploying Stable Diffusion 3 Medium on your private cloud.
Deploying Stable Diffusion 3 Medium with Tensorfuse
Each tensorkube deployment requires two things - your code and your environment (as a Dockerfile).
While deploying machine learning models, it is beneficial if your model is also a part of your container image. This reduces cold-start times by a significant margin.
To enable this, in addition to a FastAPI app and a dockerfile, we will also write code to download the model and place it in our image file.
We will write a small FastAPI app that loads the model and generate images from text prompts. The FastAPI app will have three endpoints - /readiness, /, and /generate. Remember that the /readiness endpoint is used by Tensorkube to check the health of your deployments.
main.py
Copy
Ask AI
import torchfrom fastapi import FastAPIfrom fastapi.responses import StreamingResponsefrom diffusers import StableDiffusion3Pipelineimport ioapp = FastAPI()model_dir = "models"device = torch.device("cuda" if torch.cuda.is_available() else "cpu")pipe = StableDiffusion3Pipeline.from_pretrained(model_dir, torch_dtype=torch.float16).to(device)@app.get("/")async def root(): is_cuda_available = torch.cuda.is_available() return { "message": "Hello World", "cuda_available": is_cuda_available, }@app.get("/readiness")async def readiness(): return {"status": "ready"}# an inference endpoint for image generation@app.post("/generate")async def generate_image(data: dict): text = data.get("text") if not text: return {"error": "text field is required"} prompt = text image = pipe( prompt, negative_prompt="", num_inference_steps=28, guidance_scale=7.0, ).images[0] # Convert the image to a byte stream img_byte_arr = io.BytesIO() image.save(img_byte_arr, format='PNG') img_byte_arr.seek(0) return StreamingResponse(img_byte_arr, media_type='image/png')
SD3-Medium is now ready to be deployed on Tensorkube. Navigate to your project root and run the following command:
Copy
Ask AI
tensorkube deploy --gpus 1 --gpu-type a10g
SD3-Medium is now deployed on your AWS account. You can access your app at the URL provided in the output or using the following command:
Copy
Ask AI
tensorkube list deployments
And that’s it! You have successfully deployed SD3-Medium on serverless GPUs using Tensorkube. 🚀To test it out you can run the following command by replacing the URL with the one provided in the output:
Copy
Ask AI
curl -X POST -H "Content-Type: application/json" -d '{"text":"Generate an image of a cat holding Hello world signboard"}' <YOUR_APP_URL_HERE>/generate -o output_image.png
You can also use the readiness endpoint to wake up your nodes in case you are expecting incoming traffic