William Merrill

Will is an incoming Assistant Professor at the Toyota Technical Institute at Chicago and currently a Young Investigator at the Allen Institute for AI. He received his PhD from New York University working with Tal Linzen and Ashish Sabharwal, supported by an NSF graduate research fellowship and Two Sigma PhD fellowship. A major focus of Will’s research has been developing theory on the computational power and limitations of transformers, with an eye towards guiding the analysis and design of new architectures and inference methods. More generally, he is interested in theoretical computer science, computational linguistics, and the science of deep learning.

Contact: willm[æt]{nyu.edu,allenai.org,ttic.edu} or here for anonymous feedback

Potential PhD students: I will be recruiting PhD students to start in 2026. If you would like to work with me, please apply to TTIC and mention my name in your application! See my application FAQs

Latest posts

Oct 1, 2025	My Dissertation is Now Online!
Apr 15, 2022	Project: Improved Adversarial Robustness via Abstract Interpretation
Apr 16, 2020	A Formal Hierarchy of RNN Architectures

Publications

2025

arXiv

RELIC: Evaluating Compositional Instruction Following via Language Recognition

Jackson Petty, Michael Y. Hu, Wentao Wang, and 3 more authors

Jun 2025

HTML
arXiv

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training

William Merrill, Shane Arora, Dirk Groeneveld, and 1 more author

Jun 2025

HTML
arXiv

Exact Expressive Power of Transformers with Padding

William Merrill, and Ashish Sabharwal

Jun 2025

HTML
COLM

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Bo Peng, Ruichong Zhang, Daniel Goldstein, and 15 more authors

In COLM, Oct 2025
COLM

2 OLMo 2 Furious

Team OLMo, Pete Walsh, Luca Soldaini, and 37 more authors

In COLM, Oct 2025

HTML
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

William Merrill, and Ashish Sabharwal

Mar 2025

HTML
ACL

Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases

Michael Y. Hu, Jackson Petty, Chuan Shi, and 2 more authors

In ACL, Jul 2025

\textbfOutstanding Paper

HTML

2024

EMNLP

Evaluating n-Gram Novelty of Language Models Using Rusty-DAWG

William Merrill, Noah A. Smith, and Yanai Elazar

In EMNLP, Nov 2024

HTML
COLM

Let’s Think Dot by Dot: Hidden computation in transformer language models

Jacob Pfau, William Merrill, and Samuel Bowman

In COLM, Oct 2024

HTML
ICML

The Illusion of State in State-Space Models

William Merrill, Jackson Petty, and Ashish Sabharwal

In ICML, Jul 2024

HTML
ACL

Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment

William Merrill, Zhaofeng Wu, Norihito Naka, and 2 more authors

In ACL, Aug 2024

HTML
ACL

Computational Expressivity of Neural Language Models

Alexandra Butoi, Robin Chan, Ryan Cotterell, and 5 more authors

In ACL, Aug 2024

HTML
TACL

What Formal Languages Can Transformers Express? A Survey

Lena Strobl, William Merrill, Gail Weiss, and 2 more authors

TACL, May 2024

HTML
ACL

OLMo: Accelerating the Science of Language Models

Dirk Groeneveld, Iz Beltagy, Pete Walsh, and 40 more authors

In ACL, Aug 2024

\textbfBest Theme Paper

HTML
ICML

How Language Model Hallucinations Can Snowball

Muru Zhang, Ofir Press, William Merrill, and 2 more authors

In ICML, Jul 2024

HTML
ICLR

The Expressive Power of Transformers with Chain of Thought

William Merrill, and Ashish Sabharwal

In ICLR, May 2024

HTML

2023

DLT

Formal Languages and the NLP Black Box

William Merrill

In Developments in Language Theory, Jun 2023

HTML
ME-FoMo

A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks

William Merrill, Nikolaos Tsilivis, and Aman Shukla

In ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models, May 2023

HTML
TACL

Transparency Helps Reveal When Language Models Learn Meaning

Zhaofeng Wu, William Merrill, Hao Peng, and 2 more authors

TACL, May 2023

HTML
NeurIPS

A Logic for Expressing Log-Precision Transformers

William Merrill, and Ashish Sabharwal

In NeurIPS, Dec 2023

HTML
TACL

The Parallelism Tradeoff: Limitations of Log-Precision Transformers

William Merrill, and Ashish Sabharwal

TACL, Jun 2023

HTML

2022

CoNLL

Entailment Semantics Can Be Extracted from an Ideal Language Model

William Merrill, Alex Warstadt, and Tal Linzen

In CoNLL, Dec 2022

HTML
arXiv

Extracting Finite Automata from RNNs Using State Merging

William Merrill, and Nikolaos Tsilivis

Jan 2022

HTML
TACL

Saturated Transformers are Constant-Depth Threshold Circuits

William Merrill, Ashish Sabharwal, and Noah A. Smith

TACL, Aug 2022

HTML
ACL

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

Sanjay Subramanian, William Merrill, Trevor Darrell, and 3 more authors

In ACL, May 2022

HTML

2021

EMNLP

Competency Problems: On Finding and Removing Artifacts in Language Data

Matt Gardner, William Merrill, Jesse Dodge, and 4 more authors

In EMNLP, Nov 2021

HTML
Formal Language Theory Meets Modern NLP

William Merrill

Feb 2021

HTML
TACL

Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand?

William Merrill, Yoav Goldberg, Roy Schwartz, and 1 more author

TACL, Sep 2021

HTML
EMNLP

Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent

William Merrill, Vivek Ramanujan, Yoav Goldberg, and 2 more authors

In EMNLP, Nov 2021

HTML

2020

ACL

A Formal Hierarchy of RNN Architectures

William Merrill, Gail Weiss, Yoav Goldberg, and 3 more authors

In ACL, Jul 2020

HTML
COVID19

CORD-19: The COVID-19 Open Research Dataset

Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, and 25 more authors

In ACL Workshop on NLP for COVID-19, Jul 2020

HTML
arXiv

On the Linguistic Capacity of Real-Time Counter Automata

William Merrill

Sep 2020

HTML

2019

DeLeFoL

Sequential Neural Networks as Automata

William Merrill

In ACL Workshop on Deep Learning and Formal Languages, Aug 2019

HTML
BlackboxNLP

Finding Hierarchical Structure in Neural Stacks Using Unsupervised Parsing

William Merrill, Lenny Khazan, Noah Amsel, and 3 more authors

In ACL Workshop BlackboxNLP, Aug 2019

HTML
LChange

Detecting Syntactic Change Using a Neural Part-of-Speech Tagger

William Merrill, Gigi Stark, and Robert Frank

In ACL Workshop on Computational Approaches to Historical Language Change, Aug 2019

HTML

2018

BlackboxNLP

Context-Free Transductions with Neural Stacks

Yiding Hao, William Merrill, Dana Angluin, and 4 more authors

In EMNLP Workshop BlackboxNLP, Nov 2018

HTML
NAACL

End-to-End Graph-Based TAG Parsing with Neural Networks

Jungo Kasai, Robert Frank, Pauli Xu, and 2 more authors

In NAACL, Nov 2018

HTML
TULCon

A Semantics of Subordinate Clauses Using Delayed Evaluation

William Merrill

In Toronto Undergraduate Linguistics Conference, Mar 2018

HTML