Finetune Llama 3 70B on your AWS account
Finetune LoRA adapters for popular models using axolotl styled declarative configs
Fine-tuning Guide for Tensorfuse
This guide explains how to fine-tune Llama models using Tensorfuse’s QLoRA implementation.
Supported Models
Model | GPU Requirements |
---|---|
Llama 3.1 70B | 4x L40S (Recommended) |
Llama 3.1 8B | 1-2x A10G |
Dataset Preparation
Tensorfuse accepts datasets in JSONL format, where each line contains a valid JSON object.
The following example shows the format for a conversational dataset using the ChatML format:
Dataset Commands
Once you have created your dataset, you can start fine-tuning your model. But before that, you need to create an authentication token from huggingface.
Authentication
Create required secrets. Tensorkube uses Kubernetes Event Driven Autoscaling (KEDA) under the hood to scale and schedule training runs. Hence, you need to create your
secrets in the keda
environment:
Programatic Access
Tensorfuse allows you to interact with the TensorKube cluster using the Python SDK, which provides a straightforward interface for creating fine-tuning jobs.
Authentication
First, you need to create access keys, which are required to authenticate with the TensorKube cluster deployed in your cloud.
Run the following command:
This will create a new user and provide you with access keys.
Next, export the AWS keys as environment variables where you will be running the Python code:
The following code demonstrates how to create a fine-tuning job using the Python SDK. The create_fine_tuning_job function fine-tunes a LLaMA 70B base model using L40S GPUs.
To know the status of the job, you can use the get_job_status
function. The function returns the status of the job as QUEUED
, PROCESSING
, COMPLETED
, or FAILED
.
Once the job is completed, the adapter is uploaded to s3. If you go to your s3 console you can get your adapters as follows
- find the s3 bucket with prefix
tensorkube-train-bucket
. All your training lora adapters will reside here. We construct adapter id from yourjob-id
and the type of gpus used for training so your adapter urls would look like this:-s3://<bucket-name>/lora-adapter/<job_name>/<job_id>
Below is an example of a training adapter url with job_name fine-tuning-job
and job-id unique_id
, trained on 4
gpu of type l40s
Model Deployment
- Clone Lorax repository:
- Use the following command to deploy
This will deploy the base model with the lorax
library.
- Get. your deployment url using
tensorkube list deployments
.
Inference
You can now use the deployment URL to make inference requests. Here is an example using curl
. This will query the base model without any adapters.
For using the adapter, you can use the following command: