Prerequisites
Before starting, ensure you have configured Tensorfuse on your AWS account. If not, refer to Getting Started guide.Deploy a Fine-tuning Job Using Tensorfuse
To deploy a job with Tensorfuse, perform the following steps:- Prepare the Dockerfile
- Clone the fine-tuning script
- Create Tensorfuse secrets
- Deploy the job with Tensorfuse
Step 1: Prepare the Dockerfile
Dockerfile
Step 2: Clone the fine-tuning script
The fine-tuning script utilizes unsloth and GRPO to fine-tune Qwen 7B on the openai/gsm8k dataset using reward functions. The script integrates wandb for logging and Hugging Face for uploading the LoRA adapter. It is inspired by this unsloth guide. The fine tuning script can also be clonned from this git repository. The cloned repository contains two folders. finetuning and inference. The finetuning folder contains the training script and reward functions. The inference folder contains the code to deploy vllm server with tensorfuse. Clone the script from this Git repository. The repository contains two folders:- finetuning: Contains the training script and reward functions.
- inference: Contains code to deploy the vLLM server with Tensorfuse.
Step 3: Create Tensorfuse Secrets
The training script requires WandB for logging and Hugging Face for uploading the LoRA adapter. Create secrets for both using the following commands:Step 4: Deploy job with tensorfuse
Now that your Dockerfile, fine-tuning scripts, and secrets are ready, deploy the job:Running the job
After deploying, you can run the job using:get_queued_message()
function from tensorkube package.
Checking the job status
You can check the status of the job using the following command:SUCCESS
once the job is completed.
Transform Qwen 7B into Your Own Reasoning Model
Once the job is successfully completed, the LoRA adapter will be available on the Hugging Face model hub. You can use this LoRA adapter for inference tasks involving reasoning. For this guide, we’ll deploy a vLLM server to utilize the LoRA adapter for inference. Theinference
folder in the cloned repository contains the code to deploy vllm server with tensorfuse.
Run the following commands to deploy the vLLM server with your LoRA adapter:
<TENSORKUBE_DEPLOYMENT_URL>
with the actual deployment url of the vllm server.<HUGGING_FACE_ORG_NAME>
with your actual hugging face org name.lora_name
should be unique for each lora adapter.lora_path
should be the path to the lora adapter in the hugging face model hub.<TENSORKUBE_DEPLOYMENT_URL>
with the actual deployment url of the vllm server.model
should be the lora adapter name.The above guide is a high level overview of the steps involved in transforming Qwen 7B into a reasoning model. You can customize the training script and deployment configurations as per your requirements.Before moving to production, please follow this guide for a production-ready vLLM server deployment and custom domains for secure endpoints.