Features

  • Run any code on your private cloud without worrying about infrastructure.
  • Use GPUs that can scale down to zero and up to handle many concurrent function calls.
  • Run your code on various hardware, including GPUs (A10G, A100, H100), Trainium/Inferentia chips, TPUs, or FPGAs.
  • Expose your models as OpenAI compatible APIs.
  • Run serverless training jobs without managing conda environments or turning off ML instances.
  • Finetune base models using techniques like LoRA, QLoRA and Reinforcement Learning with support for Axolotl out of the box.
  • Use hot reloading GPU devcontainers to experiment directly on GPUs.

How Does It Work?

Tensorfuse manages a Kubernetes cluster on your infrastructure. It provides the necessary custom resources and code to enable serverless GPUs. Here’s how it works:

  • Cluster Management: Tensorfuse monitors your current workloads and scales nodes to zero when not in use.
  • Function Execution: Tensorfuse brings in general purpose AMIs that let you run functions on different hardware such as GPUs, TPUs, Inferentia / Trainium chips and FPGAs.
  • Custom Docker Implementation: Tensorfuse includes a custom Docker implementation to build larger images with faster cold start times.
  • Autoscaling: An autoscaler adjusts resources based on the number of incoming HTTP requests, the size of the job queues or the number of concurrent function calls.
  • Custom Networking Layer : Tensorfuse extends Istio to set up a custom networking layer, allowing you to define endpoints and communicate between functions and data sources and run multinode GPU Inference / Training runs.

The best part is that all of this is abstracted away. While working with tensorfuse, you will not be dealing with any of the concepts mentioned above.