Introduction

Features

Run any code on your private cloud without worrying about infrastructure.
Use GPUs that can scale down to zero and up to handle many concurrent function calls.
Run your code on various hardware, including GPUs (A10G, A100, H100), Trainium/Inferentia chips, TPUs, or FPGAs.
Expose your models as OpenAI compatible APIs.
Run serverless training jobs without managing conda environments or turning off ML instances.
Finetune base models using techniques like LoRA, QLoRA and Reinforcement Learning with support for Axolotl out of the box.
Use hot reloading GPU devcontainers to experiment directly on GPUs.

How Does It Work?

Tensorfuse manages a Kubernetes cluster on your infrastructure. It provides the necessary custom resources and code to enable serverless GPUs. Here’s how it works:

Cluster Management: Tensorfuse monitors your current workloads and scales nodes to zero when not in use.
Function Execution: Tensorfuse brings in general purpose AMIs that let you run functions on different hardware such as GPUs, TPUs, Inferentia / Trainium chips and FPGAs.
Custom Docker Implementation: Tensorfuse includes a custom Docker implementation to build larger images with faster cold start times.
Autoscaling: An autoscaler adjusts resources based on the number of incoming HTTP requests, the size of the job queues or the number of concurrent function calls.
Custom Networking Layer : Tensorfuse extends Istio to set up a custom networking layer, allowing you to define endpoints and communicate between functions and data sources and run multinode GPU Inference / Training runs.

The best part is that all of this is abstracted away. While working with tensorfuse, you will not be dealing with any of the concepts mentioned above.

Basics

Enterprise Setup

Architecture

Features

How Does It Work?

Basics

Enterprise Setup

Architecture

​Features

​How Does It Work?

Features

How Does It Work?