Anatomy of a Deployment
Every Tensorfuse Deployment consists of three key components. This separation of concerns makes your projects clean, portable, and easy to manage.- Application Code: This is the core logic of your service. It can be a FastAPI app, a vLLM server for a large language model, or any other application that can be containerized.
-
Environment (
Dockerfile
): ADockerfile
defines your application’s environment. It specifies the base image, system dependencies, Python packages, and the command needed to start your service. Tensorfuse uses this file to build a container image that is identical for development and production. -
Configuration (
deployment.yaml
): This YAML file defines the infrastructure and runtime settings for your Deployment. Here, you specify the required resources (like GPU type and count), scaling parameters, secrets to inject, and health check endpoints.
The Deployment Workflow
When you run thetensorkube deploy
command, Tensorfuse performs the following steps automatically:
- Builds your
Dockerfile
into a container image. - Pushes the image to a private container registry (ECR) inside your AWS account.
- Provisions the hardware you requested in your configuration.
- Deploys your container and connects it to the autoscaler.
- Exposes a secure HTTPS endpoint to serve traffic.
Configuring Your Deployment
You can configure your deployment’s resources and behavior in two ways: command-line flags or adeployment.yaml
file.
While CLI flags are useful for quick tests, we strongly recommend using a deployment.yaml
file for production workloads. This allows you to version control your infrastructure configuration alongside your code, following a GitOps approach.
To deploy, simply run:
Example configuration
A typicaldeployment.yaml
file specifies the required GPUs, attaches secrets, and defines a readiness probe. For a full list of
available configuration options, refer to the Deployment Configuration Reference.
Readiness probe
A readiness probe is a crucial part of a production-grade deployment. It’s a health check endpoint that Tensorfuse uses to determine if your application has started successfully and is ready to accept traffic. If you don’t configure areadiness
endpoint, Tensorfuse will not know when your container is truly ready,
which can lead to failed requests. Always include a readiness
block in your deployment.yaml
to ensure your
deployments are robust and reliable.