Complete Guide to LLM Fine-tuning with Unsloth Docker: From Setup to Production
⏱️ Estimated Reading Time: 12 minutes
Introduction
Fine-tuning large language models (LLMs) has become increasingly important for creating specialized AI applications. However, setting up the proper environment for LLM training can be challenging due to complex dependencies and potential conflicts. Unsloth’s Docker solution eliminates these issues by providing a pre-configured, stable environment for efficient LLM fine-tuning.
In this comprehensive tutorial, we’ll explore how to use Unsloth’s Docker image to fine-tune LLMs locally, covering everything from initial setup to practical training examples.
What is Unsloth?
Unsloth is a powerful framework designed to accelerate LLM fine-tuning while reducing memory usage. It provides significant performance improvements over traditional fine-tuning methods, making it possible to train larger models on consumer hardware.
Key Benefits of Unsloth Docker
- Dependency Management: Eliminates “dependency hell” with a fully contained environment
- System Safety: Runs without root privileges, keeping your system clean
- Portability: Works consistently across different platforms and setups
- Pre-configured Environment: Includes all necessary tools and libraries
- Regular Updates: Frequently updated with the latest improvements
Prerequisites
Before starting, ensure you have:
- NVIDIA GPU: Required for efficient training (RTX 3060 or better recommended)
- Docker: Installed and running on your system
- NVIDIA Container Toolkit: For GPU access within containers
- Sufficient Storage: At least 50GB free space for models and data
- RAM: 16GB or more recommended
Step 1: Installing Docker and NVIDIA Container Toolkit
Installing Docker
For Linux systems:
# Update package index
sudo apt-get update
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# Add user to docker group
sudo usermod -aG docker $USER
newgrp docker
For other systems, visit the official Docker installation guide.
Installing NVIDIA Container Toolkit
The NVIDIA Container Toolkit enables GPU access within Docker containers:
# Set version (use latest stable version)
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1
# Install NVIDIA Container Toolkit
sudo apt-get update && sudo apt-get install -y \
nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
# Restart Docker daemon
sudo systemctl restart docker
Verify Installation
Test your GPU access:
# Test NVIDIA Docker integration
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi
Step 2: Running the Unsloth Docker Container
Basic Container Setup
Create a working directory and run the container:
# Create working directory
mkdir -p ~/unsloth-workspace
cd ~/unsloth-workspace
# Run Unsloth container with basic configuration
docker run -d \
--name unsloth-training \
-e JUPYTER_PASSWORD="mypassword" \
-p 8888:8888 \
-p 2222:22 \
-v $(pwd)/work:/workspace/work \
--gpus all \
unsloth/unsloth
Advanced Configuration
For production use, consider this enhanced setup:
# Generate SSH key for secure access
ssh-keygen -t rsa -b 4096 -f ~/.ssh/unsloth_key
# Run container with advanced settings
docker run -d \
--name unsloth-production \
-e JUPYTER_PORT=8000 \
-e JUPYTER_PASSWORD="secure_password_2024" \
-e "SSH_KEY=$(cat ~/.ssh/unsloth_key.pub)" \
-e USER_PASSWORD="unsloth2024" \
-p 8000:8000 \
-p 2222:22 \
-v $(pwd)/work:/workspace/work \
-v $(pwd)/models:/workspace/models \
-v $(pwd)/datasets:/workspace/datasets \
--gpus all \
--restart unless-stopped \
unsloth/unsloth
Step 3: Accessing Jupyter Lab
Web Interface Access
- Open your browser and navigate to
http://localhost:8888
- Enter the password you set (default: “unsloth”)
- You’ll see the Jupyter Lab interface with pre-loaded notebooks
SSH Access (Optional)
For command-line access:
# Connect via SSH
ssh -i ~/.ssh/unsloth_key -p 2222 unsloth@localhost
Step 4: Understanding the Container Structure
The Unsloth container is organized as follows:
/workspace/
├── work/ # Your mounted work directory
├── unsloth-notebooks/ # Example fine-tuning notebooks
├── models/ # Model storage (if mounted)
└── datasets/ # Dataset storage (if mounted)
/home/unsloth/ # User home directory
Step 5: Your First Fine-tuning Example
Let’s create a simple fine-tuning example using Llama-3.
Create a New Notebook
- In Jupyter Lab, create a new notebook
- Add the following code cells:
# Cell 1: Install and import dependencies
from unsloth import FastLanguageModel
import torch
# Cell 2: Load the model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/llama-3-8b-bnb-4bit",
max_seq_length=2048,
dtype=None,
load_in_4bit=True,
)
# Cell 3: Configure LoRA
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
random_state=3407,
use_rslora=False,
loftq_config=None,
)
# Cell 4: Prepare dataset
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
def formatting_prompts_func(examples):
instructions = examples["instruction"]
inputs = examples["input"]
outputs = examples["output"]
texts = []
for instruction, input, output in zip(instructions, inputs, outputs):
text = alpaca_prompt.format(instruction, input, output) + tokenizer.eos_token
texts.append(text)
return { "text" : texts, }
# Load dataset
from datasets import load_dataset
dataset = load_dataset("yahma/alpaca-cleaned", split="train")
dataset = dataset.map(formatting_prompts_func, batched=True)
# Cell 5: Training configuration
from trl import SFTTrainer
from transformers import TrainingArguments
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=2048,
dataset_num_proc=2,
packing=False,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=60,
learning_rate=2e-4,
fp16=not torch.cuda.is_bf16_supported(),
bf16=torch.cuda.is_bf16_supported(),
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs",
),
)
# Cell 6: Start training
trainer_stats = trainer.train()
Monitor Training Progress
The training process will display progress bars and loss metrics. Monitor these to ensure training is proceeding correctly.
Step 6: Saving and Using Your Fine-tuned Model
Save in Different Formats
# Save as Hugging Face format
model.save_pretrained("my_finetuned_model")
tokenizer.save_pretrained("my_finetuned_model")
# Save as GGUF for Ollama
model.save_pretrained_gguf("my_model", tokenizer, quantization_method="q4_k_m")
# Save for VLLM
model.save_pretrained_merged("my_model_vllm", tokenizer, save_method="merged_16bit")
Test Your Model
# Test inference
FastLanguageModel.for_inference(model)
inputs = tokenizer(
[alpaca_prompt.format(
"What is the capital of France?",
"",
""
)], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
print(tokenizer.batch_decode(outputs))
Advanced Configuration Options
Environment Variables
Variable | Description | Default |
---|---|---|
JUPYTER_PASSWORD |
Jupyter Lab password | unsloth |
JUPYTER_PORT |
Jupyter Lab port | 8888 |
SSH_KEY |
SSH public key | None |
USER_PASSWORD |
User password for sudo | unsloth |
GPU Memory Optimization
For systems with limited GPU memory:
# Use smaller batch sizes
per_device_train_batch_size=1
gradient_accumulation_steps=8
# Enable gradient checkpointing
use_gradient_checkpointing="unsloth"
# Use 4-bit quantization
load_in_4bit=True
Multi-GPU Training
For systems with multiple GPUs:
# Run container with all GPUs
docker run -d \
--gpus all \
# ... other parameters
unsloth/unsloth
# In training script, use DataParallel
model = torch.nn.DataParallel(model)
Troubleshooting Common Issues
GPU Not Detected
# Check GPU availability
nvidia-smi
# Verify Docker GPU access
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi
Memory Issues
- Reduce batch size
- Enable gradient checkpointing
- Use 4-bit quantization
- Clear GPU cache:
torch.cuda.empty_cache()
Container Access Issues
# Check container status
docker ps -a
# View container logs
docker logs unsloth-training
# Restart container
docker restart unsloth-training
Best Practices
1. Data Management
- Use volume mounts for persistent storage
- Organize datasets in dedicated directories
- Backup important models regularly
2. Resource Monitoring
# Monitor GPU usage
import GPUtil
GPUtil.showUtilization()
# Monitor system resources
import psutil
print(f"CPU: {psutil.cpu_percent()}%")
print(f"RAM: {psutil.virtual_memory().percent}%")
3. Security Considerations
- Use strong passwords for Jupyter access
- Implement SSH key authentication
- Run containers as non-root users
- Regularly update the Unsloth image
4. Performance Optimization
- Use appropriate batch sizes for your GPU
- Enable mixed precision training
- Utilize gradient accumulation for effective larger batch sizes
- Monitor training metrics to prevent overfitting
Production Deployment
Docker Compose Setup
Create a docker-compose.yml
for easier management:
version: '3.8'
services:
unsloth:
image: unsloth/unsloth
container_name: unsloth-production
environment:
- JUPYTER_PASSWORD=secure_password
- JUPYTER_PORT=8888
- USER_PASSWORD=unsloth2024
ports:
- "8888:8888"
- "2222:22"
volumes:
- ./work:/workspace/work
- ./models:/workspace/models
- ./datasets:/workspace/datasets
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
Automated Training Pipeline
Create a training script for automated workflows:
#!/usr/bin/env python3
"""
Automated Unsloth training pipeline
"""
import argparse
import json
from pathlib import Path
from unsloth import FastLanguageModel
from transformers import TrainingArguments
from trl import SFTTrainer
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--config", required=True, help="Training config JSON file")
args = parser.parse_args()
# Load configuration
with open(args.config) as f:
config = json.load(f)
# Initialize model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=config["model_name"],
max_seq_length=config["max_seq_length"],
load_in_4bit=config.get("load_in_4bit", True)
)
# Configure LoRA
model = FastLanguageModel.get_peft_model(
model,
r=config["lora_r"],
target_modules=config["target_modules"],
lora_alpha=config["lora_alpha"],
lora_dropout=config["lora_dropout"],
bias="none",
use_gradient_checkpointing="unsloth"
)
# Training logic here...
print("Training completed successfully!")
if __name__ == "__main__":
main()
Conclusion
Unsloth Docker provides an excellent solution for LLM fine-tuning, eliminating setup complexity while maintaining performance and flexibility. By following this tutorial, you now have:
- A fully configured Unsloth environment
- Understanding of basic and advanced configuration options
- Practical experience with fine-tuning workflows
- Knowledge of best practices and troubleshooting techniques
The containerized approach ensures reproducible results and makes it easy to scale your fine-tuning operations across different environments.
Next Steps
- Experiment with Different Models: Try fine-tuning various model architectures
- Explore Advanced Techniques: Investigate reinforcement learning and DPO training
- Optimize for Production: Implement automated training pipelines
- Monitor Performance: Set up comprehensive logging and monitoring
Additional Resources
- Unsloth Official Documentation
- Unsloth GitHub Repository
- Docker Best Practices
- NVIDIA Container Toolkit Documentation
Happy fine-tuning! 🚀