Tag Archives: Methods
Computational Modeling of Sound Change

James Kirby – University of Edinburgh
Morgan Sonderegger – McGill University
Course time: Tuesday/Thursday 3:30-5:20 pm
2347 Mason Hall

See Course Description

Decades of empirical research have led to an increasingly nuanced picture of the nature of phonetic and phonological change, incorporating insights from speech production and perception, cognitive biases, and social factors. However, there remains a significant gap between observed patterns and proposed mechanisms, in part due to the difficulty of conducting the type of controlled studies necessary to test hypotheses about historical change. Computational and mathematical models provide an alternative means by which such hypotheses can be fruitfully explored. With an eye towards Box’s dictum (all models are wrong, but some are useful), this course asks: how can computational models be useful for understanding why phonetic and phonological change occurs?  Students will study the growing and varied literature on computational and mathematical modeling of sound change that has emerged over the past decade and a half, including models of phonetic change in individuals over the lifespan, phonological change in speech communities in historical time, and lexical diffusion. Discussion topics will include the strengths and weaknesses of different approaches (e.g.simulation-based vs. mathematical models); identifying which modeling frameworks are best suited for particular types of research questions; and methodological considerations in modeling phonetic and phonological change. For this course, some background in probability theory, single-variable calculus, and/or linear algebra is helpful but not required.

, , , , ,

Computational Psycholinguistics

John Hale – Cornell University
Lars Konieczny – University of Freiburg
Course time: Monday/Wednesday 9:00-10:50 am
2330 Mason Hall

See Course Description

This course examines cognitive models of human sentence comprehension. Such models are programs that express psycholinguistic theories of how people unconsciously put together words and phrases in order to make sense of what they hear (or read). They hold out the promise of rigorously connecting behavioral measurements to broader theories, for instance theories of natural language syntax or cognitive architecture. The course brings students up to speed on the role of computer models in cognitive science generally, and situates the topic in relation to neighboring fields such as psychology and generative grammar. Students master several different viewpoints on what it might mean to “attach” a piece of phrase structure. Attendees will get familiar with notions of experience, probability and information theory as candidate explanations of human sentence processing difficulty. This course has no prerequisites although exposure to artificial intelligence, generative grammar and cognitive psychology will help deepen the experience.

, , ,

Corpus-based Linguistic Research: From Phonetics to Pragmatics

Mark Liberman – University of Pennsylvania
Course time: Monday/Wednesday 1:30-3:20 pm
Aud C

See Course Description

Corpus-based Linguistic Research: From Phonetics to Pragmatics

Course website: http://languagelog.ldc.upenn.edu/myl/lsa2013/

Big, fast, cheap, computers; ubiquitous digital networks; huge and
growing archives of text and speech; good and improving algorithms for
automatic analysis of text and speech: all of this creates a
cornucopia of research opportunities, at every level of linguistic
analysis from phonetics to pragmatics. This course will survey the
history and prospects of corpus-based research on speech, language,
and communication, in the context of class participation in a series
of representative projects. Programming ability, though helpful, is
not required.

This course will cover:

* How to find or create resources for empirical research in linguistics
* How to turn abstract issues in linguistic theory into concrete
questions about linguistic data
* Problems of task definition and inter-annotator agreement
* Exploratory data analysis versus hypothesis testing
* Programs and programming: practical methods for searching,
classifying, counting, and measuring
* A survey of relevant machine-learning algorithms and applications

We will explore these topics through a series of empirical research
exercises, some planned in advance and some developed in response to
the interests of participants.

There will be some connections to the ICPSR Summer Program in
Quantitative Methods of Social Research:

, ,

Experimental Pragmatics

Gregory Ward – Northwestern University
William S. Horton – Northwestern University
Course time: Monday/Wednesday 11:00 am – 12:50 pm
2306 Mason Hall

See Course Description

The emerging field of experimental pragmatics combines an interest in the theoretical complexities of language use with the experimental methodologies of psycholinguistics. This course will present a broad survey of recent work in this area that has attempted to apply the methods of experimental psychology to classic issues in theoretical pragmatics. Each class session will include both theoretical and experimental readings on topics such as reference, information structure, implicature, and speech acts. These topics wrestle with the relationship between the sentence, as an abstract object with phonological, syntactic, and semantic properties assigned by the grammar of the language, and the utterance, as the concrete realization of that sentence with properties inherited from consideration of the discourse situation. The class will also focus on a number of experimental and analytical methodologies that have been used to investigate these topics, including reaction time studies, eyetracking, and corpus analysis.  In general, the course will be organized primarily around discussion of the assigned readings, and students will have the opportunity to develop a research proposal relevant to issues in language use.  No specific background in or familiarity with particular experimental methods or approaches is required.


Field Methods

Keren Rice – University of Toronto
Course time: Monday/Tuesday/Wednesday/Thursday 3:30-5:20
2437 Mason Hall
Note: This class may count for double credit.

See Course Description

This course is an introduction to linguistic field methods. We will work with a speaker of a language that none of us know, endeavoring to discover as much as possible about the structure of the language, at all levels – phonetic, phonological, morphological, syntactic, semantic – through a combination of structured questioning and working with texts that we will record from the speaker. The emphasis will be on how to discover the systematicity of an unknown language on its own terms.

Prerequisite: Background in linguistics. Students should be able to transcribe, do morphological analysis, and syntactic analysis.

Recommended co-requisite: Tools for Language Documentation (Claire Bowern)

, , ,

Gesture and Gestural Documentation

Mandana Seyfeddinipur – The School of Oriental and African Studies (SOAS)
Course time: Monday/Wednesday 11:00 am – 12:50 pm
2427 Mason Hall

See Course Description

In the past ten years the study of hand gestures has become an established area of investigation in different disciplines. This course will provide an introduction to theoretical and methodological issues in manual gesture research. The course will provide a solid foundation for further research into the phenomenon by the course participants. We will explore the role of manual gesture in language, culture and cognition and provide hands on training in methods in gesture research. The basic functions of gesture in communication, its interaction with speech in the creation of meaning as well as its role in cognition will be introduced. One focus will be how to document gesture in actual language use doing fieldwork.  In the practical component participants will learn how to record gesture data in naturalistic as well as in experimental settings. In addition the course will provide the opportunity to learn how to annotate and code gesture with available software. Participants are encouraged to bring their own recordings for annotation and analyses. Some familiarity with general linguistics is presumed.

, ,

Lexicography in Natural Language Processing

Orin Hargraves – Independent Scholar
Course time: Tuesday/Thursday 9:00-10:50 am
2325 Mason Hall

See Course Description

Determining what words mean is the core skill and practice of lexicography. Determining what words mean is also a central challenge in natural language processing (NLP), where it is usually classed under the exercise of word sense disambiguation (WSD). Until the late 20th century, lexicography was dominated by scholars with backgrounds in philosophy, literature, and other humanistic disciplines, and the writing of dictionaries was based strongly on intuition, and only secondarily on induction from the study of examples of usage. Linguistics, in this same period, establish itself as a discipline with strong scientific credentials. With the development of corpora and other computational tools for processing text, dictionary makers recognized first the value, and soon the indispensability, of using evidence-based data to develop dictionary definitions, and this brought them increasingly into contact with computational linguists. The developers of computational linguistic tools and resources eventually turned their attention back to the dictionary and found that it was a document that could be exploited for use in the newly emerging fields of linguistic inquiry that computation made possible: NLP, artificial intelligence, machine learning, and machine translation. This course will explore the computational tools that lexicographers use today to write dictionaries, and the ways in which computational linguists use dictionaries in their pursuits. The aim is to give students an appreciation of the unexploited opportunities that dictionary databases offer to NLP, and of the challenges that stand in the way of their exploitation. Students will have an opportunity to explore the ways in which dictionaries may aid or hinder automatic WSD, and they will be encouraged to develop their own models for the use of dictionary databases in NLP.

Students must have native-speaker fluency in English. Thorough knowledge of Englsih grammar and morphology is an advantage, as is knowledge of the rudiments of NLP.

, ,

Linguistics as a Forensic Science

Carole E. Chaski – Institute for Linguistic Evidence
Course time: Tuesday/Thursday 3:30-5:20 pm
2336 Mason Hall

See Course Description

Linguistics as a Forensic Science introduces students to the current state of the art in forensic linguistics. Students learn the legal standards that linguistic evidence must meet, how linguistic research has produced methods that meet these standards, as well as examples of methodological failure. Cases and rulings are discussed in the context of methodological issues for linguistics, and to demonstrate the seriousness of legal standards. Examined in detail are linguistic methods for author identification, text classification, intertextuality and linguistic profiling.  Most forensic linguistic methods attempt to identify, individuate or classify texts, so automatically texts are seen as instances of either individual or group variation (i.e. the method must be able to categorize texts as belonging to different individuals, the method must be able to classify texts as belonging to a particular type of text, the method must be able to identify texts as coming from a person with a certain level of education or dialect, and so forth).

 The paradigm which students learn in this course is one in which (1) universal principles provide methodological grounding for the analysis of variation, (2) texts are analyzed for the instantiation of syntactic and semantic properties, (3) the instantiations are quantified, (4) the quantifications are subjected to statistical analysis, (5) the statistical analysis is subjected to validation testing for error rates. This paradigm –known as computational forensic linguistics– poses several challenges to linguistics as a science, such as, the choice of levels and units for linguistic analysis of forensic texts for specific tasks, the predictability of linguistic behavior, tools for analysis of variable linguistic behavior, and the model of language which is both circumscribed or determined by universal principles but at the same time instantiated in group and individual behaviors. Thus, computational forensic linguistics provides a proving ground for how universal principles ground analysis and method so that individual and group variability can be accurately captured and then used for prediction –the core of scientific endeavors.

Current forensic linguistics methods exemplify the tension between universality and variability. The ways in which different methods embrace universality or variability have either enabled or prevented linguistic methods from reaching error rates low enough for legal use. Admissible methods that have successfully met the scientific rigor required for legal evidence combine analysis based on universal principles of linguistic structure with statistical analysis of linguistic variability.  On the other hand, methods which have focused on variability to the exclusion of universal principles have failed methodologically to produce repeatable results or low error rates, and have thus not met legal standards and are generally ruled as inadmissible.  The computational forensic linguistic paradigm embraces variability as the core of most forensic linguistic problems, with universal structural principles as the primary analytical approach for solving these problems. Only this synergistic approach — a structural-behaviorist approach— actually works to produce feasible forensic linguistic methods that are theoretically grounded, replicable and reliable.

Students in this course should have already taken an introductory linguistics course. Students in may also find the Institute courses on R and Python to be good courses to take at the same time but they are not required.


Machine Learning

Steve Abney – University of Michigan
Course time: Monday/Wednesday 11:00 am – 12:50 pm
1401 Mason Hall

See Course Description

This course provides a general introduction to machine learning. Unlike results in learnability, which are very abstract and have limited practical consequences, machine learning methods are eminently practical, and provide detailed understanding of the space of possibilities for human language learning.

Machine learning has come to dominate the field of computational linguistics: virtually every problem of language processing is treated as a learning problem.  Machine learning is also making inroads into mainstream linguistics, particularly in the area of phonology. Stochastic Optimality Theory and the use of maximum entropy models for phonotactics may be cited as two examples.

The course will focus on giving a general understanding of how machine learning methods work, in a way that is accessible to linguistics students. There will be some discussion of software, but the focus will be on understanding what the software is doing, not in the details of using a particular package.

The topics to be touched on include classification methods (Naive Bayes, the perceptron, support vector machines, boosting, decision trees, maximum entropy classifiers) and clustering (hierarchical clustering, k-means clustering, the EM algorithm, latent semantic indexing), sequential models (Hidden Markov Models, conditional random fields) and grammatical inference (probabilistic context-free grammars, distributional learning), semisupervised learning (self-training, co-training, spectral methods) and reinforcement learning.

, , ,

Mixed Effect Models

T. Florian Jaeger – University of Rochester
Course time: Tuesday/Thursday 3:30-6:00 pm, and all day Friday, July 12; last two weeks of Institute only (July 9, 11, 16, 18)

See Course Description

With increasing use of quantitative behavioral data, statistical data analysis has rapidly become a crucial part of linguistic training. Linguistic data analysis is often particularly challenging because (i) the relevant data are often sparse, (ii) the data sets are often unbalanced with regard to the variables of interest, and (iii) data points are typically not sampled independently of each other, making it necessary to account for—possibly hierarchical—grouping structures (clusters) in the data. This course provides an introduction to several advanced data analyses techniques that help us to address these challenges. We will focus on the Generalized Linear Model (GLM) and Generalized Linear Mixed Model (GLMM) – what they are, how to fit them, what common ‘traps’ to be aware of, how to interpret them, and how to report and visualize results obtained from these models. GLMs and GLMMs are a powerful tool to understand complex data, including not only whether effects are significant but also what direction and shape they have. GLMs have been used in corpus and sociolinguistics since at least the 60s. GLMMs have recently been introduced to language research through corpus- and psycholinguistics. They are rapidly becoming a popular data analysis techniques in these and other fields (e.g. sociolinguistics).

In this course, I will assume a basic statistical background and a conceptual understanding of at least linear regression.

, ,

Modeling and Measuring Inflectional Paradigms

Andrew Hippisley – University of Kentucky
Greg Stump – University of Kentucky
Raphael Finkel – University of Kentucky
Course time: Tuesday/Thursday 9:00-10:50 am
2333 Mason Hall

See Course Description

The emergence of inferential-realizational approaches to inflection has led to a dramatic reversal of a perspective on morphology that dominated twentieth-century grammatical theory, where inflectional paradigms were regarded as an epiphenomenon of the combinatory properties of inflectional morphemes and were accorded no theoretical importance.   The new perspective suggests that paradigms are essential to the definition of a language’s inflectional morphology and that they constitute a significant domain of measurable typological variation.  The purpose of this course is to investigate both the universal principles of paradigm structure and the dimensions and degrees of cross-linguistic variation in paradigm structure.  Central to our method is the use of computational resources for the formal modeling and typological measurement of inflectional paradigms.  We begin by examining inferential-realizational theories of inflection and their place in the broader theoretical landscape. Numerous considerations decisively favor the inferential-realizational approach.  We exemplify this approach with Paradigm Function Morphology, a precise system of universal principles for the definition of inflectional systems.  We then consider two different approaches to modeling paradigm realization in inferential-realizational theories, the exponence-based approach, computationally illustrated through Network Morphology; and the implicative approach, computationally illustrated by the Principal-Parts Analyzer. Both approaches are then contrasted in the way they account for inflectional classes, and for the exponent-based account we introduce the concept of default inheritance hierarchy, for the implicative the notion of principal parts.  We move on to look at the diversity of paradigm structures, treating it as various departures from a canonical norm.  Two kinds of phenomena responsible for paradigm structure variation are syncretism and deponency, both covered in some detail.  Further variation is identified by considering the predictability of cells, and we consider the implicative structure of paradigms. We go on to relate this concept to the property of inflectional complexity, a point of comparison between languages’ morphological systems that lends itself to a typological treatment.  Throughout the course practical hands-on computational sessions will supplement and illustrate theoretical points made. An introduction to linguistics course is strongly advised, and knowledge of morphology is desirable.


Praat Scripting

Kevin McGowan – Rice University
Course time:
Tuesday/Thursday 11:00 am – 12:50 pm, MLB OR
Monday/Wednesday 1:30 pm – 3:20 pm, 2353 Mason Hall

See Course Description

This course introduces basic automation and scripting skills for linguists using Praat. The course will expand upon a basic familiarity with Praat and explore how scripting can help you automate mundane tasks, ensure consistency in your analyses, and provide implicit (and richly-detailed) methodological documentation of your research.  Our main goals will be:

    1.  To expand upon a basic familiarity with Praat by exploring the software’s capabilities and learning the details of its scripting language.

    2.  To practice a set of scripting best practices to help you not only write and maintain your own scripts but evaluate scripts written by others.

The course assumes participants have read and practiced with the Intro from Praat’s help manual. Topics to be covered include:

    o Working with the Objects, Editor, and Picture windows

    o Finding available commands

    o Creating new commands

    o Working with TextGrids

    o Conditionals, flow control, and error handling

    o Using strings, numbers, formulas, arrays, and tables

    o Automating phonetic analysis

    o Testing, adapting, and using scripts from the internet

, , ,

Python 3 for Linguists

Damir Cavar – Eastern Michigan University
Malgorzata E. Cavar – Eastern Michigan University
Course time: Monday/Wednesday 9:00-10:50 am, MLB OR
Tuesday/Thursday 11:00 am – 12:50 pm, 2347 Mason Hall

See Course Description

This course introduces basic programming and scripting skills to linguists using the Python 3 programming language and common development environments. Our main goals are:

- to offer an entry point to programming and computation for humanities students, and whoever is interested

- to do so without requiring any previous computer or IT knowledge (except basic computer experience and common lay-person computer knowledge).

The course covers in eight sessions the interaction with the Python programming environment, an introduction to programming, and an introduction to linguistically relevant text and data processing algorithms, including quantitative and statistical analyses, as well as qualitative and symbolic methods.

Existing Python code libraries and components will be discussed, and practical usage examples given. The emphasis in this course is on being creative with a programming language, and teaching content that is geared towards specific tasks that linguists are confronted with, where computation of large amounts of data or time consuming annotation and data manipulation tasks are necessary. Among the tasks we consider essential are:

- reading text and language data from- and writing to files in various encodings, using different orthographic systems and standards, corpus encoding formats and technologies (e.g. XML),

- generating and processing of word lists, linguistic annotation models, N-gram models, frequency profiles to study quantitative and qualitative aspects of language, for example, variation in language, computational dialectology, similarity or dissimilarity at different linguistic levels,

- symbolic processing of regular grammar rules to be used in finite state automata for processing of phonotactic information or morphology, but also context free grammars and parsers for syntactic analyses, and higher level grammar formalisms, and the use of these grammars and language processing algorithms.

, ,

Semantic Fieldwork Methods

Judith Tonhauser – Ohio State University
Course time: Tuesday/Thursday 1:30-3:20 pm
2347 Mason Hall

See Course Description

This course introduces participants to the methodology of collecting semantic/pragmatic data in collaboration with linguistically untrained native speaker consultants.

Data that may inform semantic/pragmatic theorizing are typically quite complex, consisting of 1) one or more grammatical sentences that are 2) uttered in an appropriately designed context, and 3) a native speaker’s judgment about the acceptability or the truth of the sentence(s) uttered in that context.

The goal of the course is to familiarize students with the empirical, theoretical and methodological considerations relevant to obtaining such data. In particular, topics to be discussed include the kinds of judgments obtainable from native speakers, distinguishing syntactically ill-formed from semantically/pragmatically anomalous sentences/utterances, the importance of context and how to appropriately control for it, reporting semantic/pragmatic data, and the generalizability of results.

The course also examines the benefits of and difficulties with exploring semantic/pragmatic research questions through texts. The relative merits of one-on-one elicitation and controlled experiments with linguistically untrained native speakers are also considered.

Although much of the data provided for in-class discussion comes from Paraguayan Guaraní (Tupí-Guaraní), in particular studies of temporal and nominal reference, and of presuppositions and other projective contents, the course aims to prepare participants to conduct semantic/pragmatic fieldwork on any topic in any language. Note that this course does not have a regular practical component during which course participants work with a native speaker consultant; Professor Keren Rice’s field methods course (http://lsa2013.lsa.umich.edu/2012/05/field-methods/) is highly recommended for this purpose.

This course is targeted at students already familiar with formal syntax, semantics and pragmatics who wish to collect data with native speakers, as well as students who already have experience in conducting research with native speakers and want to extend their research to semantic/pragmatic topics. Interested course participants should contact the instructor (judith@ling.osu.edu) with questions about the course content and suitability.

, , ,

Sociocultural Discourse Analysis

Barb Meek – University of Michigan
Susan Philips – University of Arizona
Course time: Monday/Wednesday 3:30-5:20 pm
2325 Mason Hall

See Course Description

The purpose of this course is to provide training in discourse analysis that focuses on how culture is manifest in discourse practices.  Recordings of socially occurring speech render relatively ephemeral speech in a material and permanent form that gives it cultural reliability and repeatability not available in data collected through other anthropological/ethnographic research methods such as participant observation and note taking.  Topics include: 1) Research design.  When is recording useful, appropriate, and ethical; what kinds of activities will be recorded and how much material in hours will be recorded? 2) Transcription, translation and computer entry of recordings.  How to choose what to transcribe and how much to transcribe; in-field versus after-fieldwork transcription and translation; selection of transcription formats and software for coding data.  3) Analysis based on recordings, transcripts and coding of transcripts. Using the comparative method, identification of relevant units of interaction and their internal sequencing; comparison of multiple instances of the same units of interaction; comparison of multiple kinds of units of interaction and forms of talk; relating discourse analysis to other kinds of data concerning forms of local knowledge in order to make claims for sociocultural processes greater in scale than the discourse data.  4) Analysis of linguistic structures crucial to the interactional constitution of cultural processes, e.g. mood/modality; agency; evidentiality.  This will be a hands-on course involving analysis of data provided by the instructors.  This approach can serve scholars interested in how culture and language are mutually constituted through not only socially occurring speech, but also in interviews, in written records and in the media.  The planning and implementation of research in linguistic anthropology, cultural anthropology, sociolinguistics, and language change can be strengthened by greater knowledge of the theoretical and methodological underpinnings of discourse analysis.

Some experience with linguistic analysis/description is preferred, but not required.

, ,

Statistical Reasoning for Linguistics

Stefan Gries – University of California, Santa Barbara
Course time: Monday/Wednesday 3:30-5:20 pm
2407 Mason Hall

See Course Description

This course is aimed at beginners in statistics and will cover (1) the theoretical foundations of statistical reasoning as well as (2) selected practical applications. As for (1), we will discuss notions such as (different types of) variables, operationalization, (null and alternative) hypotheses, additive and interactive effects, significance testing and p-values, model(ing) and model selection, etc. As for (2), we will be concerned with how to annotate and prepare data for statistical analysis using spreadsheet software, how to use the open-source language and environment R <www.r-project.org>) to

- explore data visually using a multitude of graphs (an important precursor to any kind of statistical analysis) and exploratory statistical tools (e.g., cluster analysis);

- conduct some basic statistical tests;

- explore briefly more advanced statistical regression modeling techniques.

The course will be leaning on the second edition of my textbook on statistics for linguists (to be published 2013 by Mouton de Gruyter). Examples will include observational and experimental data from a variety of linguistic sub-disciplines.

, ,

Tools for Language Documentation

Claire Bowern – Yale University
Course time: Tuesday/Thursday 9:00-10:50 am
2330 Mason Hall

See Course Description

This four-week course will cover a selection of the software, hardware, and stimulus kits/surveys which are most useful in documenting languages. The course will begin with an overview of software tools for organizing language data, including Toolbox and Elan, and hardware (e.g. audio and video recorders) for making recordings. Week 2 will focus on tools related to grammatical documentation (e.g. in the writing of reference grammars) and will include the use of structured stimulus kits, questionnaires, and tools for organizing transcripts and analytical data. Week 3 will focus on corpus planning and the collection of narratives and conversational data. Week 4 will concentrate on software and techniques for lexical elicitation, along with collection archiving. Each class will have a practical component and class participants are encouraged to bring their own data sets; however, data samples will also be available for those who need them. Some familiarity with general linguistics is presumed.

, , ,