Basics
Introduction
Tensorfuse lets you run serverless GPUs on your own infrastructure
Features
- Run any code on your private cloud without worrying about infrastructure.
- Use GPUs that can scale down to zero and up to handle many concurrent function calls.
- Run functions in custom container environments or use our optimized Machine Images.
- Monitor and debug multimodal chains.
- Run your code on various hardware, including GPUs (A10G, A100, H100), Trainium/Inferentia chips, TPUs, or FPGAs.
- Expose your models as OpenAI compatible APIs.
- Run serverless training jobs without managing conda environments or turning off ML instances.
How Does It Work?
Tensorfuse manages a Kubernetes cluster on your infrastructure. It provides the necessary custom resources and code to enable serverless GPUs. Here’s how it works:
- Cluster Management: Tensorfuse monitors your current workloads and scales nodes to zero when not in use.
- Function Execution: Tensorfuse brings in general purpose AMIs that let your run functions on different hardware such as GPUs, TPUs, Inferentia / Trainium chips and FPGAs.
- Custom Docker Implementation: Tensorfuse includes a custom Docker implementation to build larger images (including models) with faster cold start times.
- Autoscaling: An autoscaler adjusts resources based on the number of incoming HTTP requests.
- Custom Networking Layer : Tensorfuse extends Istio to set up a custom networking layer, allowing you to define endpoints and communicate between functions and data sources.
The best part is that all of this is abstracted away. While working with tensorfuse, you will not be dealing with any of the concepts mentioned above.