Customer Stories
How Vaero built and deployed 1000s of custom LLMs that write like Humans with Tensorfuse
Jul 9, 2025
5 Min
About Vaero
Vaero AI is on a mission to produce the most human sounding writing in the world. Vaero trains custom LLMs to match each user’s writing style and tone. Users provide examples from emails, documents, etc., and Vaero automatically builds models that write exactly like them.
Problem
Vaero customizes models for each user, requiring infrastructure capable of running multiple fine-tuning jobs serverlessly and performing real-time inference across hundreds of adapters with low latency. They initially chose Fireworks as their primary infrastructure provider but encountered several issues:
- Limited control due to hidden training hyper-parameters.
- Unpredictable system outages causing production downtime.
- Rate limits of only 100 fine-tuned adapters, restricting scalability.
- Delayed support
Solution
With Tensorfuse, they got full control over all the training parameters, significantly improving model performance. Vaero could fine-tune thousands of LoRA adapters without any rate limits, achieving high throughput and low latency.
Tensorfuse provisioned the entire training and inference infrastructure directly in Vaero's AWS account, enabling up to 70% cost savings by using existing cloud credits.
Tensorfuse supported Vaero by:
- Setting up a highly optimised EKS cluster for AI workloads in their AWS VPC.
- Allowing Vaero to submit serverless fine-tuning jobs programmatically using Tensorfuse’s Python SDK..
- Jobs auto-scale down to zero upon completion.
- Unlike Fireworks, they got full access to training parameters drastically enhancing model performance.
- Creating and maintaining an optimised fork of the official vLLM image tailored for multi-LoRA inference.
Result
Vaero and Tensorfuse worked together to deliver accurate, high-performance and low latency deployments for their custom LLMs.
- Improved model accuracy compared to Fireworks.
- Reduced infrastructure costs by 70% using existing cloud credits.
- Accelerated iteration speed and reduced time to market by 10x.