
Notes from ICLR 2026: Five Orals on ML Architectures and Training
Notes from Day 2's first oral session at ICLR 2026. Five papers on learning rate schedules, MoE sparsity, curriculum data, softmax expressivity, and pre-training under infinite compute.

