Transformers

17 Mar 2026

Training a Small GPT-2 Model Under 20M Parameters

5 min guide to GPT-2 decoder intuition and training for practical tasks