Noah A. Smith
Saturated Transformers are Constant-Depth Threshold Circuits
Competency Problems: On Finding and Removing Artifacts in Language Data
Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent
Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand?
A Formal Hierarchy of RNN Architectures