TRL

Use TRL for fine-tuning with native Hugging Face integration, support for SFT, DPO, PPO, and GRPO, and access to the most recent training algorithms and techniques.

LFM models work out-of-the-box with TRL without requiring any custom integration. Different training methods require specific dataset formats. See Finetuning Datasets for format requirements.

Installation

Install TRL and required dependencies:

pip install trl>=0.9.0 transformers>=4.55.0 torch>=2.6 peft accelerate

trl: Core training library
peft: LoRA/QLoRA support
accelerate: Multi-GPU and distributed training

Supervised Fine-Tuning (SFT)

The SFTTrainer makes it easy to fine-tune LFM models on instruction-following or conversational datasets. It handles chat templates, packing, and dataset formatting automatically. SFT training requires Instruction datasets.

LoRA Fine-Tuning (Recommended)

LoRA (Low-Rank Adaptation) is the recommended approach for fine-tuning LFM2 models with TRL. It offers several key advantages:

Memory efficient: Trains only small adapter weights (~1-2% of model size) instead of full model parameters
Data efficient: Achieves strong task performance improvements with less training data than full fine-tuning
Fast training: Reduced parameter count enables faster iteration and larger effective batch sizes
Flexible: Easy to switch between different task adapters without retraining the base model

For memory-efficient fine-tuning, use LoRA with the SFTTrainer:

from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTTrainer, SFTConfig
from peft import LoraConfig
from datasets import load_dataset

model = AutoModelForCausalLM.from_pretrained(
    "LiquidAI/LFM2.5-1.2B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct")

# Configure LoRA
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    task_type="CAUSAL_LM",
)

training_args = SFTConfig(
    output_dir="./lfm2-sft-lora",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    bf16=True,
)

dataset = load_dataset("HuggingFaceTB/smoltalk", "all")

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    tokenizer=tokenizer,
    peft_config=peft_config,
)

trainer.train()

Full Fine-Tuning

Full fine-tuning updates all model parameters. Use this only when you have sufficient GPU memory and need maximum adaptation for your task.

from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "LiquidAI/LFM2.5-1.2B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct")

# Load your dataset
dataset = load_dataset("HuggingFaceTB/smoltalk", "all")

# Configure training
training_args = SFTConfig(
    output_dir="./lfm2-sft",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-5,
    logging_steps=10,
    save_strategy="epoch",
    bf16=True,
)

# Create trainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    tokenizer=tokenizer,
)

# Train
trainer.train()

Vision Language Model Fine-Tuning (VLM-SFT)

The SFTTrainer also supports fine-tuning Vision Language Models like LFM2.5-VL-1.6B on image-text datasets. VLM fine-tuning requires Vision datasets and a few key differences from text-only SFT:

Uses AutoModelForImageTextToText instead of AutoModelForCausalLM
Uses AutoProcessor instead of just a tokenizer
Requires dataset formatting with image content types
Needs a custom collate_fn for multimodal batching

VLM LoRA Fine-Tuning (Recommended)

LoRA is recommended for VLM fine-tuning due to the larger model size and multimodal complexity:

from transformers import AutoModelForImageTextToText, AutoProcessor
from trl import SFTTrainer, SFTConfig
from peft import LoraConfig, get_peft_model
from datasets import load_dataset

model_id = "LiquidAI/LFM2.5-VL-1.6B"

processor = AutoProcessor.from_pretrained(
    model_id,
    max_image_tokens=256,
    trust_remote_code=True
)

model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype="bfloat16",
    device_map="auto",
    trust_remote_code=True
)

# Format dataset for VLM (image + text)
def format_vlm_sample(sample):
    return [
        {"role": "system", "content": [{"type": "text", "text": "You are a vision assistant."}]},
        {"role": "user", "content": [
            {"type": "image", "image": sample["image"]},
            {"type": "text", "text": sample["question"]},
        ]},
        {"role": "assistant", "content": [{"type": "text", "text": sample["answer"]}]},
    ]

# Custom collate function for multimodal batching
def collate_fn(samples):
    batch = processor.apply_chat_template(samples, tokenize=True, return_dict=True, return_tensors="pt")
    labels = batch["input_ids"].clone()
    labels[labels == processor.tokenizer.pad_token_id] = -100
    batch["labels"] = labels
    return batch

# Configure LoRA
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.05,
    r=8,
    bias="none",
    target_modules=["q_proj", "v_proj", "fc1", "fc2", "linear", "gate_proj", "up_proj", "down_proj"],
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, peft_config)

sft_config = SFTConfig(
    output_dir="./lfm2-vl-sft-lora",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=16,
    learning_rate=5e-4,
    gradient_checkpointing=True,
    max_length=512,
    dataset_kwargs={"skip_prepare_dataset": True},
)

# Load and format your dataset
dataset = load_dataset("HuggingFaceH4/llava-instruct-mix-vsft")
train_dataset = [format_vlm_sample(s) for s in dataset["train"]]

trainer = SFTTrainer(
    model=model,
    args=sft_config,
    train_dataset=train_dataset,
    data_collator=collate_fn,
    processing_class=processor.tokenizer,
)

trainer.train()

Full VLM Fine-Tuning

Full VLM fine-tuning updates all model parameters. Use this only when you have sufficient GPU memory.

from transformers import AutoModelForImageTextToText, AutoProcessor
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset

model_id = "LiquidAI/LFM2.5-VL-1.6B"

processor = AutoProcessor.from_pretrained(
    model_id,
    max_image_tokens=256,
    trust_remote_code=True
)

model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype="bfloat16",
    device_map="auto",
    trust_remote_code=True
)

def format_vlm_sample(sample):
    return [
        {"role": "user", "content": [
            {"type": "image", "image": sample["image"]},
            {"type": "text", "text": sample["question"]},
        ]},
        {"role": "assistant", "content": [{"type": "text", "text": sample["answer"]}]},
    ]

def collate_fn(samples):
    batch = processor.apply_chat_template(samples, tokenize=True, return_dict=True, return_tensors="pt")
    labels = batch["input_ids"].clone()
    labels[labels == processor.tokenizer.pad_token_id] = -100
    batch["labels"] = labels
    return batch

sft_config = SFTConfig(
    output_dir="./lfm2-vl-sft",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=16,
    learning_rate=2e-5,
    gradient_checkpointing=True,
    max_length=512,
    dataset_kwargs={"skip_prepare_dataset": True},
)

dataset = load_dataset("HuggingFaceH4/llava-instruct-mix-vsft")
train_dataset = [format_vlm_sample(s) for s in dataset["train"]]

trainer = SFTTrainer(
    model=model,
    args=sft_config,
    train_dataset=train_dataset,
    data_collator=collate_fn,
    processing_class=processor.tokenizer,
)

trainer.train()

Direct Preference Optimization (DPO)

The DPOTrainer implements Direct Preference Optimization, a method to align models with human preferences without requiring a separate reward model. DPO training requires Preference datasets with chosen and rejected response pairs.

DPO with LoRA (Recommended)

LoRA is highly recommended for DPO training, as it significantly reduces memory requirements while maintaining strong alignment performance.

from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import DPOTrainer, DPOConfig
from peft import LoraConfig
from datasets import load_dataset

model = AutoModelForCausalLM.from_pretrained(
    "LiquidAI/LFM2.5-1.2B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct")

peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    task_type="CAUSAL_LM",
)

training_args = DPOConfig(
    output_dir="./lfm2-dpo-lora",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=5e-7,
    beta=0.1,
    bf16=True,
)

dataset = load_dataset("mlabonne/orpo-dpo-mix-40k")

trainer = DPOTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    tokenizer=tokenizer,
    peft_config=peft_config,
)

trainer.train()

Full DPO Training

Full DPO training updates all model parameters. Use this only when you have sufficient GPU memory.

from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import DPOTrainer, DPOConfig
from datasets import load_dataset

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "LiquidAI/LFM2.5-1.2B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct")

# Load preference dataset
# Dataset should have "prompt", "chosen", and "rejected" columns
dataset = load_dataset("mlabonne/orpo-dpo-mix-40k")

# Configure DPO training
training_args = DPOConfig(
    output_dir="./lfm2-dpo",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=5e-7,
    beta=0.1,  # DPO temperature parameter
    logging_steps=10,
    bf16=True,
)

# Create trainer
trainer = DPOTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    tokenizer=tokenizer,
)

# Train
trainer.train()

Other Training Methods

TRL also provides additional trainers that work seamlessly with LFM models:

RewardTrainer: Train reward models for RLHF
PPOTrainer: Proximal Policy Optimization for reinforcement learning from human feedback
ORPOTrainer: Odds Ratio Preference Optimization, an alternative to DPO
KTOTrainer: Kahneman-Tversky Optimization for alignment

Refer to the TRL documentation for detailed guides on these methods.

Tips

Learning Rates: SFT typically uses higher learning rates (1e-5 to 5e-5) than DPO (1e-7 to 1e-6)
Batch Size: DPO requires larger effective batch sizes; increase gradient_accumulation_steps if GPU memory is limited
LoRA Ranks: Start with r=16 for experimentation; increase to r=64 or higher for better quality
DPO Beta: The beta parameter controls the deviation from the reference model; typical values range from 0.1 to 0.5

For more end to end examples, visit the Liquid AI Cookbook. Edit this page

Get Started

Models

Key Concepts

Inference

Fine-tuning

Help

Installation

Supervised Fine-Tuning (SFT)

LoRA Fine-Tuning (Recommended)

Vision Language Model Fine-Tuning (VLM-SFT)

VLM LoRA Fine-Tuning (Recommended)

Direct Preference Optimization (DPO)

DPO with LoRA (Recommended)

Other Training Methods

Tips

Get Started

Models

Key Concepts

Inference

Fine-tuning

Help

​Installation​

​Supervised Fine-Tuning (SFT)​

​LoRA Fine-Tuning (Recommended)​

​Vision Language Model Fine-Tuning (VLM-SFT)​

​VLM LoRA Fine-Tuning (Recommended)​

​Direct Preference Optimization (DPO)​

​DPO with LoRA (Recommended)​

​Other Training Methods​

​Tips​

Installation

Supervised Fine-Tuning (SFT)

LoRA Fine-Tuning (Recommended)

Vision Language Model Fine-Tuning (VLM-SFT)

VLM LoRA Fine-Tuning (Recommended)

Direct Preference Optimization (DPO)

DPO with LoRA (Recommended)

Other Training Methods

Tips