Transforming Qwen 7B into Your Own Reasoning Model
Fine-tune Qwen 7B for reasoning tasks on your AWS account using tensorfuse and unsloth with GRPO.
Fine-tuning large language models like Qwen 7B is essential for adapting them to specific tasks. To transform Qwen 7B into a reasoning model, we’ll use a powerful reinforcement learning algorithm called GRPO by DeepSeek, along with Unsloth, a fast and memory-efficient training library.
This guide will demonstrate how to create a fine-tuning job for Qwen 7B using job queues and save the resulting LoRA adapter to Hugging Face. We’ll also deploy a vLLM server for inference tasks. We’ll use one GPU of type L40s for training and inference.
Prerequisites
Before starting, ensure you have configured Tensorfuse on your AWS account. If not, refer to Getting Started guide.
Deploy a Fine-tuning Job Using Tensorfuse
To deploy a job with Tensorfuse, perform the following steps:
-
Prepare the Dockerfile
-
Clone the fine-tuning script
-
Create Tensorfuse secrets
-
Deploy the job with Tensorfuse
Step 1: Prepare the Dockerfile
Step 2: Clone the fine-tuning script
The fine-tuning script utilizes unsloth and GRPO to fine-tune Qwen 7B on the openai/gsm8k dataset using reward functions. The script integrates wandb for logging and Hugging Face for uploading the LoRA adapter. It is inspired by this unsloth guide.
The fine tuning script can also be clonned from this git repository. The cloned repository contains two folders. finetuning and inference. The finetuning folder contains the training script and reward functions. The inference folder contains the code to deploy vllm server with tensorfuse.
Clone the script from this Git repository. The repository contains two folders:
- finetuning: Contains the training script and reward functions.
- inference: Contains code to deploy the vLLM server with Tensorfuse.
Step 3: Create Tensorfuse Secrets
The training script requires WandB for logging and Hugging Face for uploading the LoRA adapter. Create secrets for both using the following commands:
Replace placeholders with your actual API keys.
Step 4: Deploy job with tensorfuse
Now that your Dockerfile, fine-tuning scripts, and secrets are ready, deploy the job:
This command builds the Docker image, uploads it to the registry, and deploys it on your Tensorfuse cluster. For details on deploying jobs, refer to the job queues documentation.
Running the job
After deploying, you can run the job using:
The payload parameters are accessible in the script via get_queued_message()
function from tensorkube package.
Checking the job status
You can check the status of the job using the following command:
The status of the job will be displayed in the output. It should show SUCCESS
once the job is completed.
Transform Qwen 7B into Your Own Reasoning Model
Once the job is successfully completed, the LoRA adapter will be available on the Hugging Face model hub. You can use this LoRA adapter for inference tasks involving reasoning. For this guide, we’ll deploy a vLLM server to utilize the LoRA adapter for inference.
The inference
folder in the cloned repository contains the code to deploy vllm server with tensorfuse.
Run the following commands to deploy the vLLM server with your LoRA adapter:
After the deployment is ready, load your LoRA adapter into the vLLM server using the following curl command:
<TENSORKUBE_DEPLOYMENT_URL>
with the actual deployment url of the vllm server.<HUGGING_FACE_ORG_NAME>
with your actual hugging face org name.lora_name
should be unique for each lora adapter.lora_path
should be the path to the lora adapter in the hugging face model hub.Once your LoRA adapter is loaded, you can perform inference using the vLLM server. Use the following curl command to test inference:
<TENSORKUBE_DEPLOYMENT_URL>
with the actual deployment url of the vllm server.model
should be the lora adapter name.That’s it! You have successfully transformed Qwen 7B into your own reasoning model using Tensorfuse and Unsloth. You can now utilize your LoRA adapter for inference on various reasoning tasks.
The above guide is a high level overview of the steps involved in transforming Qwen 7B into a reasoning model. You can customize the training script and deployment configurations as per your requirements.
Before moving to production, please follow this guide for a production-ready vLLM server deployment and custom domains for secure endpoints.