T. Florian Jaeger – University of Rochester
Course time: Tuesday/Thursday 3:30-6:00 pm, and all day Friday, July 12; last two weeks of Institute only (July 9, 11, 16, 18)
With increasing use of quantitative behavioral data, statistical data analysis has rapidly become a crucial part of linguistic training. Linguistic data analysis is often particularly challenging because (i) the relevant data are often sparse, (ii) the data sets are often unbalanced with regard to the variables of interest, and (iii) data points are typically not sampled independently of each other, making it necessary to account for—possibly hierarchical—grouping structures (clusters) in the data. This course provides an introduction to several advanced data analyses techniques that help us to address these challenges. We will focus on the Generalized Linear Model (GLM) and Generalized Linear Mixed Model (GLMM) – what they are, how to fit them, what common ‘traps’ to be aware of, how to interpret them, and how to report and visualize results obtained from these models. GLMs and GLMMs are a powerful tool to understand complex data, including not only whether effects are significant but also what direction and shape they have. GLMs have been used in corpus and sociolinguistics since at least the 60s. GLMMs have recently been introduced to language research through corpus- and psycholinguistics. They are rapidly becoming a popular data analysis techniques in these and other fields (e.g. sociolinguistics).
In this course, I will assume a basic statistical background and a conceptual understanding of at least linear regression.