William Merrill
Will is an incoming Assistant Professor at the Toyota Technological Institute at Chicago and currently a Young Investigator at the Allen Institute for AI. He received his PhD from New York University working with Tal Linzen and Ashish Sabharwal, supported by an NSF graduate research fellowship and Two Sigma PhD fellowship. A major focus of Will’s research has been developing theory on the computational power and limitations of transformers, with an eye towards guiding the analysis and design of new architectures and inference methods. More generally, he is interested in theoretical computer science, computational linguistics, and the science of deep learning.
Contact: willm[æt]{nyu.edu,allenai.org,ttic.edu} or here for anonymous feedback
Latest posts
| Oct 1, 2025 | My Dissertation is Now Online! |
|---|---|
| Apr 15, 2022 | Project: Improved Adversarial Robustness via Abstract Interpretation |
| Apr 16, 2020 | A Formal Hierarchy of RNN Architectures |
Publications
2026
- arXiv
- arXiv
- arXiv
2025
-
- arXiv
- NeurIPSCritical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model TrainingIn NeurIPS, Dec 2025Spotlight
- arXiv
-
- COLMRWKV-7 "Goose" with Expressive Dynamic State EvolutionIn COLM, Oct 2025
- COLM
- ACLBetween Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic BiasesIn ACL, Jul 2025Outstanding Paper
2024
- EMNLP
- COLM
- ICML
- ACL
- ACL
- TACL
- ACL
- ICML
- ICLR
2023
- DLT
- ME-FoMoA Tale of Two Circuits: Grokking as Competition of Sparse and Dense SubnetworksIn ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models, May 2023
- TACL
- NeurIPS
- TACL
2022
- CoNLL
- arXiv
- TACL
- ACL
2021
- EMNLP
-
- TACLProvable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand?TACL, Sep 2021
- EMNLPEffects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient DescentIn EMNLP, Nov 2021
2020
- ACL
- COVID19
- arXiv
2019
- DeLeFoLSequential Neural Networks as AutomataIn ACL Workshop on Deep Learning and Formal Languages, Aug 2019
- BlackboxNLPFinding Hierarchical Structure in Neural Stacks Using Unsupervised ParsingIn ACL Workshop BlackboxNLP, Aug 2019
- LChangeDetecting Syntactic Change Using a Neural Part-of-Speech TaggerIn ACL Workshop on Computational Approaches to Historical Language Change, Aug 2019
2018
- BlackboxNLP
- NAACL
- TULConA Semantics of Subordinate Clauses Using Delayed EvaluationIn Toronto Undergraduate Linguistics Conference, Mar 2018