William Merrill


I am a Ph.D. student at the CDS at NYU, where I am advised by Tal Linzen. I also work closely with Ashish Sabharwal at AI2, where I used to be a PYI. My Ph.D. is supported by an NSF graduate research fellowship, Two Sigma Ph.D. Fellowship, and by AI2.

My research uses formal methods to better understand the capabilities and limitations of language models. I've worked on characterizing the computational power of transformers for representing linguistic structure and solving reasoning problems. I've also worked on understanding the aspects of semantics that can be learned from co-occurrence patterns in a large text corpus. Overall, I am interesting in building out theoretical foundations for the alchemy of large language models. Why have LMs been successful? What are their limitations? How can we more systematically understand design choices around pretraining and deployment?

Contact: willm[æt]nyu.edu or here for anonymous feedback

Outside of research, I like exploring New York City by foot, train, and boat. I like cooking new things and trying hole-in-the-wall restaurants. I also play basketball, ping pong, and Age of Empires II.

Latest posts



  1. arXiv
    Let’s Think Dot by Dot: Hidden Computation in Transformer Language Models
    Jacob Pfau, William Merrill, and Samuel Bowman
    Apr 2024
  2. ICML
    The Illusion of State in State-Space Models
    William Merrill, Jackson Petty, and Ashish Sabharwal
    In ICML, Jul 2024
  3. arXiv
    Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment
    William Merrill, Zhaofeng Wu, Norihito Naka, and 2 more authors
    Feb 2024
  4. arXiv
    OLMo: Accelerating the Science of Language Models
    Dirk Groeneveld, Iz Beltagy, Pete Walsh, and 40 more authors
    Feb 2024
  5. ICML
    How Language Model Hallucinations Can Snowball
    Muru Zhang, Ofir Press, William Merrill, and 2 more authors
    In ICML, Jul 2024
  6. ICLR
    The Expressive Power of Transformers with Chain of Thought
    William Merrill, and Ashish Sabharwal
    In ICLR, May 2024


  1. DLT
    Formal Languages and the NLP Black Box
    William Merrill
    In Developments in Language Theory, Jun 2023
  2. ME-FoMo
    A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks
    William Merrill, Nikolaos Tsilivis, and Aman Shukla
    In ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models, May 2023
  3. TACL
    Transparency Helps Reveal When Language Models Learn Meaning
    Zhaofeng Wu, William Merrill, Hao Peng, and 2 more authors
    TACL, May 2023
  4. NeurIPS
    A Logic for Expressing Log-Precision Transformers
    William Merrill, and Ashish Sabharwal
    In NeurIPS, Dec 2023
  5. TACL
    The Parallelism Tradeoff: Limitations of Log-Precision Transformers
    William Merrill, and Ashish Sabharwal
    TACL, Jun 2023


  1. CoNLL
    Entailment Semantics Can Be Extracted from an Ideal Language Model
    William Merrill, Alex Warstadt, and Tal Linzen
    In CoNLL, Dec 2022
  2. Extracting Finite Automata from RNNs Using State Merging
    William Merrill, and Nikolaos Tsilivis
    Jan 2022
  3. TACL
    Saturated Transformers are Constant-Depth Threshold Circuits
    William Merrill, Ashish Sabharwal, and Noah A. Smith
    TACL, Aug 2022
  4. ACL
    ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
    Sanjay Subramanian, William Merrill, Trevor Darrell, and 3 more authors
    In ACL, May 2022


  1. EMNLP
    Competency Problems: On Finding and Removing Artifacts in Language Data
    Matt Gardner, William Merrill, Jesse Dodge, and 4 more authors
    In EMNLP, Nov 2021
  2. Formal Language Theory Meets Modern NLP
    William Merrill
    Feb 2021
  3. TACL
    Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand?
    William Merrill, Yoav Goldberg, Roy Schwartz, and 1 more author
    TACL, Sep 2021
  4. EMNLP
    Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent
    William Merrill, Vivek Ramanujan, Yoav Goldberg, and 2 more authors
    In EMNLP, Nov 2021


  1. ACL
    A Formal Hierarchy of RNN Architectures
    William Merrill, Gail Weiss, Yoav Goldberg, and 3 more authors
    In ACL, Jul 2020
  2. COVID19
    CORD-19: The COVID-19 Open Research Dataset
    Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, and 25 more authors
    In ACL Workshop on NLP for COVID-19, Jul 2020
  3. arXiv
    On the Linguistic Capacity of Real-Time Counter Automata
    William Merrill
    Sep 2020


  1. DeLeFoL
    Sequential Neural Networks as Automata
    William Merrill
    In ACL Workshop on Deep Learning and Formal Languages, Aug 2019
  2. BlackboxNLP
    Finding Hierarchical Structure in Neural Stacks Using Unsupervised Parsing
    William Merrill, Lenny Khazan, Noah Amsel, and 3 more authors
    In ACL Workshop BlackboxNLP, Aug 2019
  3. LChange
    Detecting Syntactic Change Using a Neural Part-of-Speech Tagger
    William Merrill, Gigi Stark, and Robert Frank
    In ACL Workshop on Computational Approaches to Historical Language Change, Aug 2019


  1. BlackboxNLP
    Context-Free Transductions with Neural Stacks
    Yiding Hao, William Merrill, Dana Angluin, and 4 more authors
    In EMNLP Workshop BlackboxNLP, Nov 2018
  2. NAACL
    End-to-End Graph-Based TAG Parsing with Neural Networks
    Jungo Kasai, Robert Frank, Pauli Xu, and 2 more authors
    In NAACL, Nov 2018
  3. TULCon
    A Semantics of Subordinate Clauses Using Delayed Evaluation
    William Merrill
    Toronto Undergraduate Linguistics Conference, Nov 2018