We find that a neural network part-of-speech tagger implicity learns to model syntactic change.
I formally characterize the capacity and memory of various RNNs, given asymptotic assumptions.
Stack-augmented RNNs that are trained to perform language modeling learn to exploit linguistic structure without any supervision.
This paper analyzes the behavior of stack-augmented recurrent neural network models.
We present a graph-based Tree Adjoining Grammar parser that uses BiLSTMs, highway connections, and character-level CNNs.