Built with developer experience in mind, Tensorkube simplifies the process of deploying serverless GPU apps. In this guide, we will walk you through the process of deploying Mochi 1 preview model on your private cloud.

Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence. This new model dramatically closes the gap between closed and open video generation systems.

Prerequisites

Before you begin, ensure you have configured Tensorkube on your AWS account. If you haven’t done that yet, follow the Getting Started guide.

Deploying ComfyUI with Tensorfuse

Each tensorkube deployment requires two things - your code and your environment (as a Dockerfile). While deploying machine learning models, it is beneficial if your model is also a part of your container image. This reduces cold-start times by a significant margin.

ComfyUI stands out as one of most flexible graphical user interface (GUI) for stable diffusion, complete with an API and backend architecture. You can use any model for deployment as given in examples for ComfyUI through tensorkube.

Code files

We will use an nginx server to start our app. We will configure the /readiness endpoint to return a 200 status code. Remember that Tensorfuse uses this endpoint to check the health of your deployment.

The Comfy UI will run as a web server at port 8000 and will use all the endpoints via nginx proxy at port 80.

nginx.conf
worker_processes  auto;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

events {
    worker_connections  10;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    keepalive_timeout  800;
    keepalive_requests 100;

    server {
        listen 80;
        listen [::]:80;

        client_max_body_size 200M;

        location /readiness {
            return 200 'true';
            add_header Content-Type text/plain;
        }

        location / {
            # You may need to adjust this if your application is not running on localhost:8000
            proxy_pass http://localhost:8000;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }

        # Websocket proxy for the Comfy UI app
        location /ws {
            proxy_pass http://localhost:8000;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "Upgrade";
            proxy_set_header Host localhost:8000;
            proxy_set_header X-Forwarded-Host $host;
            proxy_read_timeout  36000s;
        }
    }
}

Environment files (Dockerfile)

Next, create a Dockerfile. Given below is a simple Dockerfile that you can use:

Dockerfile
# Use the nvidia cuda base image
FROM nvidia/cuda:12.1.1-devel-ubuntu22.04

# Update and install required packages
RUN apt-get update && apt-get install -y \
    python3.11 \
    python3.11-dev \
    python3-pip \
    nginx \
    git \
    ffmpeg \
    && rm -rf /var/lib/apt/lists/*

# Set Python 3.10 as the default Python version
RUN ln -s /usr/bin/python3.11 /usr/bin/python

# Upgrade pip
RUN pip3 install --no-cache-dir --upgrade pip && pip install requests GitPython comfy-cli ffmpeg-python opencv-python imageio_ffmpeg huggingface_hub hf_transfer

# To set git paths for Comfy UI
RUN GIT_PYTHON_REFRESH=quiet comfy --skip-prompt install --nvidia

# Install the custom nodes
RUN comfy node install https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite
RUN comfy node install https://github.com/kijai/ComfyUI-MochiWrapper 

# Set working directory
WORKDIR /code
COPY download_model.py /code/download_model.py

# Download model files
RUN HF_HUB_ENABLE_HF_TRANSFER=1 python3 download_model.py 

# Copy the Nginx configuration file
COPY nginx.conf /etc/nginx/nginx.conf

# For nginx
EXPOSE 80

# Start a uvicorn server on port 80
CMD ["sh", "-c", "nginx && comfy launch -- --port 8000"]

Also we need to keep the model files in the ComfyUI models directory. So we run to use download_model.py that download all model files in the right place.

download_model.py
import os

from huggingface_hub import snapshot_download

if __name__=='__main__':
    # 1. Load models
    os.makedirs('/root/comfy/ComfyUI/models/diffusion_models/mochi', exist_ok=True)
    snapshot_download(
        "Kijai/Mochi_preview_comfy",
        local_dir='/root/comfy/ComfyUI/models/diffusion_models/mochi',
        allow_patterns=[
            "mochi_preview_dit_fp8_e4m3fn.safetensors", 
            "mochi_preview_dit_bf16.safetensors",
            "mochi_preview_dit_GGUF_Q4_0_v2.safetensors",
            "mochi_preview_dit_GGUF_Q8_0.safetensors"
            ],
        ignore_patterns=["*.pt", "*.bin"],  # using safetensors
    )
    os.makedirs('/root/comfy/ComfyUI/models/vae/mochi', exist_ok=True)
    snapshot_download(
        "Kijai/Mochi_preview_comfy",
        local_dir='/root/comfy/ComfyUI/models/vae/mochi',
        allow_patterns=["mochi_preview_vae_bf16.safetensors"],
        ignore_patterns=["*.pt", "*.bin"],  # using safetensors
    )

    # 2. Load CLIP Model
    snapshot_download(
        "comfyanonymous/flux_text_encoders",
        local_dir='/root/comfy/ComfyUI/models/clip/',
        allow_patterns=["t5xxl_fp8_e4m3fn.safetensors"], 
        ignore_patterns=["*.pt", "*.bin"],  # using safetensors
    )

Deploying the app

ComfyUI is now ready to be deployed on Tensorkube. Navigate to your project root and run the following command:

tensorkube deploy --gpus 1 --gpu-type a10g

ComfyUI is now deployed on your AWS account. You can access your app at the URL provided in the output or using the following command:

tensorkube list deployments

And that’s it! You have successfully deployed Mochi 1 preview model on serverless GPUs using Tensorkube. 🚀

To test it out you can visit the deployment link via browser, or run the following command by replacing the URL with the one provided in the output:

curl <YOUR_URL_HERE>

To use the app, you need to use a ComfyUI workflow, using nodes from the MochiWrapper. You can use one from here and just open it using the ComfyUI interface.

You can also use the readiness endpoint to wake up your nodes in case you are expecting incoming traffic

curl <YOUR_APP_URL_HERE>/readiness