Tensorfuse manages a Kubernetes cluster on your infrastructure. It provides the necessary custom resources and code to enable serverless GPUs. Here’s how it works:
Cluster Management: Tensorfuse monitors your current workloads and scales nodes to zero when not in use.
Function Execution: Tensorfuse brings in general purpose AMIs that let you run functions on different hardware such as GPUs, TPUs, Inferentia / Trainium chips and FPGAs.
Custom Docker Implementation: Tensorfuse includes a custom Docker implementation to build larger images with faster cold start times.
Autoscaling: An autoscaler adjusts resources based on the number of incoming HTTP requests, the size of the job queues or the number of concurrent function calls.
Custom Networking Layer : Tensorfuse extends Istio to set up a custom networking layer, allowing you to define endpoints and communicate between functions and data sources and run multinode GPU Inference / Training runs.
The best part is that all of this is abstracted away. While working with tensorfuse, you will not be dealing with any of the concepts mentioned above.