Tag Archives: Statistical
Computational Modeling of Sound Change

James Kirby – University of Edinburgh
Morgan Sonderegger – McGill University
Course time: Tuesday/Thursday 3:30-5:20 pm
2347 Mason Hall

See Course Description

Decades of empirical research have led to an increasingly nuanced picture of the nature of phonetic and phonological change, incorporating insights from speech production and perception, cognitive biases, and social factors. However, there remains a significant gap between observed patterns and proposed mechanisms, in part due to the difficulty of conducting the type of controlled studies necessary to test hypotheses about historical change. Computational and mathematical models provide an alternative means by which such hypotheses can be fruitfully explored. With an eye towards Box’s dictum (all models are wrong, but some are useful), this course asks: how can computational models be useful for understanding why phonetic and phonological change occurs?  Students will study the growing and varied literature on computational and mathematical modeling of sound change that has emerged over the past decade and a half, including models of phonetic change in individuals over the lifespan, phonological change in speech communities in historical time, and lexical diffusion. Discussion topics will include the strengths and weaknesses of different approaches (e.g.simulation-based vs. mathematical models); identifying which modeling frameworks are best suited for particular types of research questions; and methodological considerations in modeling phonetic and phonological change. For this course, some background in probability theory, single-variable calculus, and/or linear algebra is helpful but not required.

, , , , ,


Mixed Effect Models

T. Florian Jaeger – University of Rochester
Course time: Tuesday/Thursday 3:30-6:00 pm, and all day Friday, July 12; last two weeks of Institute only (July 9, 11, 16, 18)
MLB

See Course Description

With increasing use of quantitative behavioral data, statistical data analysis has rapidly become a crucial part of linguistic training. Linguistic data analysis is often particularly challenging because (i) the relevant data are often sparse, (ii) the data sets are often unbalanced with regard to the variables of interest, and (iii) data points are typically not sampled independently of each other, making it necessary to account for—possibly hierarchical—grouping structures (clusters) in the data. This course provides an introduction to several advanced data analyses techniques that help us to address these challenges. We will focus on the Generalized Linear Model (GLM) and Generalized Linear Mixed Model (GLMM) – what they are, how to fit them, what common ‘traps’ to be aware of, how to interpret them, and how to report and visualize results obtained from these models. GLMs and GLMMs are a powerful tool to understand complex data, including not only whether effects are significant but also what direction and shape they have. GLMs have been used in corpus and sociolinguistics since at least the 60s. GLMMs have recently been introduced to language research through corpus- and psycholinguistics. They are rapidly becoming a popular data analysis techniques in these and other fields (e.g. sociolinguistics).

In this course, I will assume a basic statistical background and a conceptual understanding of at least linear regression.

, ,


Statistical Reasoning for Linguistics

Stefan Gries – University of California, Santa Barbara
Course time: Monday/Wednesday 3:30-5:20 pm
2407 Mason Hall

See Course Description

This course is aimed at beginners in statistics and will cover (1) the theoretical foundations of statistical reasoning as well as (2) selected practical applications. As for (1), we will discuss notions such as (different types of) variables, operationalization, (null and alternative) hypotheses, additive and interactive effects, significance testing and p-values, model(ing) and model selection, etc. As for (2), we will be concerned with how to annotate and prepare data for statistical analysis using spreadsheet software, how to use the open-source language and environment R <www.r-project.org>) to

- explore data visually using a multitude of graphs (an important precursor to any kind of statistical analysis) and exploratory statistical tools (e.g., cluster analysis);

- conduct some basic statistical tests;

- explore briefly more advanced statistical regression modeling techniques.

The course will be leaning on the second edition of my textbook on statistics for linguists (to be published 2013 by Mouton de Gruyter). Examples will include observational and experimental data from a variety of linguistic sub-disciplines.

, ,