Don't argmax; Distribution match

AGI will have learnt reward models.

Why not just stop FOOM?

Deconfusing direct vs amortized optimization

