A Deployment is a serverless, auto-scaling API endpoint that runs your containerized application on Tensorfuse. You provide your code and a configuration, and Tensorfuse handles the entire lifecycle: building your container, provisioning infrastructure, deploying the application, and serving traffic. This lets you turn any model or application into a scalable, production-ready service with a single command.

Anatomy of a Deployment

Every Tensorfuse Deployment consists of three key components. This separation of concerns makes your projects clean, portable, and easy to manage.
  1. Application Code: This is the core logic of your service. It can be a FastAPI app, a vLLM server for a large language model, or any other application that can be containerized.
  2. Environment (Dockerfile): A Dockerfile defines your application’s environment. It specifies the base image, system dependencies, Python packages, and the command needed to start your service. Tensorfuse uses this file to build a container image that is identical for development and production.
  3. Configuration (deployment.yaml): This YAML file defines the infrastructure and runtime settings for your Deployment. Here, you specify the required resources (like GPU type and count), scaling parameters, secrets to inject, and health check endpoints.

The Deployment Workflow

When you run the tensorkube deploy command, Tensorfuse performs the following steps automatically:
  1. Builds your Dockerfile into a container image.
  2. Pushes the image to a private container registry (ECR) inside your AWS account.
  3. Provisions the hardware you requested in your configuration.
  4. Deploys your container and connects it to the autoscaler.
  5. Exposes a secure HTTPS endpoint to serve traffic.

Configuring Your Deployment

You can configure your deployment’s resources and behavior in two ways: command-line flags or a deployment.yaml file. While CLI flags are useful for quick tests, we strongly recommend using a deployment.yaml file for production workloads. This allows you to version control your infrastructure configuration alongside your code, following a GitOps approach. To deploy, simply run:
tensorkube deploy --config-file deployment.yaml

Example configuration

A typical deployment.yaml file specifies the required GPUs, attaches secrets, and defines a readiness probe. For a full list of available configuration options, refer to the Deployment Configuration Reference.
# Request 4 L40S GPUs for this deployment
gpus: 4
gpu_type: l40s

# Attach secrets containing API keys or tokens.
# These will be available as environment variables.
secret:
  - hugging-face-secret
  - vllm-token

# Define a health check to ensure the app is ready for traffic
readiness:
  httpGet:
    path: /health
    port: 80

Readiness probe

A readiness probe is a crucial part of a production-grade deployment. It’s a health check endpoint that Tensorfuse uses to determine if your application has started successfully and is ready to accept traffic. If you don’t configure a readiness endpoint, Tensorfuse will not know when your container is truly ready, which can lead to failed requests. Always include a readiness block in your deployment.yaml to ensure your deployments are robust and reliable.