Will Merrill
About
Publications
Blog
Personal
Vivek Ramanujan
Latest
Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent
Cite
×