Basics
Introduction
Tensorfuse lets you run serverless GPUs on your own infrastructure
Features
- Run any code on your private cloud without worrying about infrastructure.
- Use GPUs that can scale down to zero and up to handle many concurrent function calls.
- Run your code on various hardware, including GPUs (A10G, A100, H100), Trainium/Inferentia chips, TPUs, or FPGAs.
- Expose your models as OpenAI compatible APIs.
- Run serverless training jobs without managing conda environments or turning off ML instances.
- Finetune base models using techniques like LoRA, QLoRA and Reinforcement Learning with support for Axolotl out of the box.
- Use hot reloading GPU devcontainers to experiment directly on GPUs.
How Does It Work?
Tensorfuse manages a Kubernetes cluster on your infrastructure. It provides the necessary custom resources and code to enable serverless GPUs. Here’s how it works:
- Cluster Management: Tensorfuse monitors your current workloads and scales nodes to zero when not in use.
- Function Execution: Tensorfuse brings in general purpose AMIs that let you run functions on different hardware such as GPUs, TPUs, Inferentia / Trainium chips and FPGAs.
- Custom Docker Implementation: Tensorfuse includes a custom Docker implementation to build larger images with faster cold start times.
- Autoscaling: An autoscaler adjusts resources based on the number of incoming HTTP requests, the size of the job queues or the number of concurrent function calls.
- Custom Networking Layer : Tensorfuse extends Istio to set up a custom networking layer, allowing you to define endpoints and communicate between functions and data sources and run multinode GPU Inference / Training runs.
The best part is that all of this is abstracted away. While working with tensorfuse, you will not be dealing with any of the concepts mentioned above.