containerd
, pulling the complete container image from a registry. This process is a significant bottleneck for AI workloads,
whose images often exceed 20 GB due to large model weights and dependencies like CUDA and PyTorch.
The typical startup sequence consists of three time-consuming, sequential steps:
overlayfs
. This is I/O-bound.
ENTRYPOINT
can execute. For a 20 GB image, this sequence can
take over 10 minutes. However, typically only a small fraction of the image data is required
for the application to initialize. This inefficiency leads to long cold start times, forcing teams to overprovision expensive GPU resources
to keep “warm” instances available.
containerd
remote
snapshotter. It replaces the default download-and-unpack model with an
on-demand, lazy-loading mechanism. This is achieved through two core components: a build-time image
indexer and a runtime FUSE-based daemon.
tar.gz
) for its layers. This format is a compressed stream, making random access to individual
files impossible without decompressing the entire stream up to the desired file.
Tensorfuse addresses this with a build tool that converts standard OCI images into a highly optimized and seekable format
based on the Registry Accelerated File System design, while remaining compatible with OCI registries. This conversion process
fundamentally restructures the image by separating filesystem metadata from file data. The metadata is stored in a compact “bootstrap”
file, which acts as a comprehensive Table of Contents (TOC).
The file data itself is broken down into content-addressable chunks, or “blobs”. This architecture makes the entire filesystem
instantly seekable, enabling the runtime to fetch only the required data chunks for a specific file. This bypasses the need to
download or decompress the entire multi-gigabyte layer just to start the container.
containerd
is instructed to create a container, the following occurs:
import torch
), the
Linux kernel intercepts the read()
syscall and forwards it to the Tensorfuse daemon
.
daemon
consults the pre-generated Table Of Contents (the RAFS bootstrap) to locate the file’s data within
the compressed layer in the remote registry.
HTTP
Range Request to the registry, fetching only the small chunk of compressed
data containing the file and its preceding decompression checkpoint.
read()
call.
containerd's
stable remote snapshotter gRPC
API. The key interaction
occurs during the image pull process.
containerd
calls the Prepare
method on the Tensorfuse gRPC service.ErrAlreadyExists
error.containerd
that the snapshotter can provide the layer’s contents without needing containerd
to
download and unpack it. containerd
trusts this signal and skips the download for that layer.containerd's
core code, preserving the stability and security of the standard container runtime.
Stage | Standard overlayfs | Tensorfuse Snapshotter | Improvement |
---|---|---|---|
Image Data & Unpack | ~12 minutes | Eliminated (On-Demand) | - |
Time to ENTRYPOINT | ~12 min, 5 sec | ~2 seconds | > 360x |
vLLM Server Ready | ~12 min, 30 sec | ~20 seconds | > 37x |