Complete guide to running flexible Axolotl finetuning jobs on TensorFuse with support for multiple dataset formats and configurable parameters
axolotl-config.yaml
:
axolotl-train.py
:
Dockerfile
:
instructions.csv
):
meta-llama/Llama-3.1-8B-Instruct
+ llama3
Qwen/Qwen2.5-7B-Instruct
+ qwen2_5
mistralai/Mistral-7B-Instruct-v0.3
+ mistral
codellama/CodeLlama-7b-Instruct-hf
+ llama3
COPY multilingual-conversations.jsonl .
load_in_4bit
: Reduces model weights from 16-bit to 4-bit (e.g., 8GB model → 2GB)gradient_checkpointing
: Trades compute for memory (slower but fits larger models)flash_attention
: 2-8x faster attention with lower memory footprintsample_packing
: Better GPU utilization, especially with variable-length sequencespad_to_sequence_len
: Predictable memory usage, prevents OOM errorsinference/deployment.yaml
:
inference/Dockerfile
:
evals/evaluation_script.py
to benchmark your models:
benchmark_results.json
--enable-lora
flag/v1/load_lora_adapter
endpointTRANSFORMERS_VERBOSITY=debug
in environment--debug
flag with accelerate launchtensorkube job logs --job-name your-job-name
"wandb_mode": "disabled"
to skip W&B entirely