Customer Stories

How Vaero built and deployed 1000s of custom LLMs that write like Humans with Tensorfuse

Jul 9, 2025

5 Min

100+

Fine-tuned adapters deployed on a single GPU node

100+

Fine-tuned adapters deployed on a single GPU node

100+

Fine-tuned adapters deployed on a single GPU node

100+

Fine-tuned adapters deployed on a single GPU node

40%

Higher TPS with zero rate limits

40%

Higher TPS with zero rate limits

40%

Higher TPS with zero rate limits

40%

Higher TPS with zero rate limits

70%

lower infra cost by utilising existing AWS commitment

70%

lower infra cost by utilising existing AWS commitment

70%

lower infra cost by utilising existing AWS commitment

70%

lower infra cost by utilising existing AWS commitment

Overview

Overview

Use Case
Use Case

LoRA fine-tuning and Multi LoRA Inference

LoRA fine-tuning and Multi LoRA Inference

Tensorfuse features used
Tensorfuse features used

LoRA finetuning

LoRA finetuning

Dynamic Multi-lora inference

Dynamic Multi-lora inference

Custom Model deployment

Custom Model deployment

Private deployment in customer’s own AWS VPC

Private deployment in customer’s own AWS VPC

Key Benefits
Key Benefits

Zero infra overhead

Zero infra overhead

10x faster iteration speed

10x faster iteration speed

Custom Model deployment

Custom Model deployment

High throughput and low Latency

High throughput and low Latency

Use Case

LoRA fine-tuning and Multi LoRA Inference

Tensorfuse features used

LoRA finetuning

Dynamic Multi-lora inference

Custom Model deployment

Private deployment in customer’s own AWS VPC

Key Benefits

Zero infra overhead

10x faster iteration speed

Custom Model deployment

High throughput and low Latency

About Vaero

Vaero AI is on a mission to produce the most human sounding writing in the world. Vaero trains custom LLMs to match each user’s writing style and tone. Users provide examples from emails, documents, etc., and Vaero automatically builds models that write exactly like them.

Problem

Vaero customizes models for each user, requiring infrastructure capable of running multiple fine-tuning jobs serverlessly and performing real-time inference across hundreds of adapters with low latency. They initially chose Fireworks as their primary infrastructure provider but encountered several issues:

  • Limited control due to hidden training hyper-parameters.
  • Unpredictable system outages causing production downtime.
  • Rate limits of only 100 fine-tuned adapters, restricting scalability.
  • Delayed support

Solution

With Tensorfuse, they got full control over all the training parameters, significantly improving model performance. Vaero could fine-tune thousands of LoRA adapters without any rate limits, achieving high throughput and low latency.

Tensorfuse provisioned the entire training and inference infrastructure directly in Vaero's AWS account, enabling up to 70% cost savings by using existing cloud credits.

Tensorfuse supported Vaero by:

  1. Setting up a highly optimised EKS cluster for AI workloads in their AWS VPC.
  2. Allowing Vaero to submit serverless fine-tuning jobs programmatically using Tensorfuse’s Python SDK..
    • Jobs auto-scale down to zero upon completion.
    • Unlike Fireworks, they got full access to training parameters drastically enhancing model performance.
  3. Creating and maintaining an optimised fork of the official vLLM image tailored for multi-LoRA inference.

Result

Vaero and Tensorfuse worked together to deliver accurate, high-performance and low latency deployments for their custom LLMs.

  • Improved model accuracy compared to Fireworks.
  • Reduced infrastructure costs by 70% using existing cloud credits.
  • Accelerated iteration speed and reduced time to market by 10x.

Deploy in minutes, scale in seconds

Get started for free or contact us to get a custom demo tailored to your needs.

Deploy in minutes, scale in seconds

Get started for free or contact us to get a custom demo tailored to your needs.

Deploy in minutes, scale in seconds

Get started for free or contact us to get a custom demo tailored to your needs.

Deploy in minutes, scale in seconds

Get started for free or contact us to get a custom demo tailored to your needs.

© 2024. All rights reserved.

Join our Newsletter

Sign up to our mailing list below and be the first to know about updates and founder’s notes. Don't worry, we hate spam too.

© 2024. All rights reserved.

Join our Newsletter

Sign up to our mailing list below and be the first to know about updates and founder’s notes. Don't worry, we hate spam too.

© 2024. All rights reserved.