This repository contains the data and code needed to compute word surprisal values for the L1 and L2 stimuli used in the experiment. Ultimately these surprisal values will be used to predict reading times of L1 and L2 speakers of English.
The language models are trained on preprocessed versions of the WikiText-2 dataset introduced by Merity et al.. The dataset can be downloaded from https://huggingface.co/datasets/wikitext.
KenLM
Roark incremental parser
Recurrent neural network grammars