Training Intermediate Also known as: Quantized LoRA

QLoRA

/kew-lor-ah/

A variant of LoRA that keeps the base model in 4-bit quantized form during fine-tuning, drastically cutting the GPU memory needed.

ShareLinkedIn X

In practice

It lets you adapt 13B-70B parameter models on a single consumer GPU (e.g. RTX 4090 or 24-40 GB A100). It is the favorite technique for hobbyist or low-budget enterprise fine-tuning. Quality loss vs. full-precision fine-tuning is almost negligible.

Related terms

LoRA Quantization Fine-tuning SFT

Seen in the wild

1 entries mentioning it

August 20, 2024

bitsandbytes 0.43: QLoRA and NF4/FP4 quantization for 4-bit fine-tuning

Medium

← All terms