Tag Archives: Computational/Corpus
Corpus-based Linguistic Research: From Phonetics to Pragmatics

Mark Liberman – University of Pennsylvania
Course time: Monday/Wednesday 1:30-3:20 pm
Aud C

See Course Description

Corpus-based Linguistic Research: From Phonetics to Pragmatics

Course website: http://languagelog.ldc.upenn.edu/myl/lsa2013/

Big, fast, cheap, computers; ubiquitous digital networks; huge and
growing archives of text and speech; good and improving algorithms for
automatic analysis of text and speech: all of this creates a
cornucopia of research opportunities, at every level of linguistic
analysis from phonetics to pragmatics. This course will survey the
history and prospects of corpus-based research on speech, language,
and communication, in the context of class participation in a series
of representative projects. Programming ability, though helpful, is
not required.

This course will cover:

* How to find or create resources for empirical research in linguistics
* How to turn abstract issues in linguistic theory into concrete
questions about linguistic data
* Problems of task definition and inter-annotator agreement
* Exploratory data analysis versus hypothesis testing
* Programs and programming: practical methods for searching,
classifying, counting, and measuring
* A survey of relevant machine-learning algorithms and applications

We will explore these topics through a series of empirical research
exercises, some planned in advance and some developed in response to
the interests of participants.

There will be some connections to the ICPSR Summer Program in
Quantitative Methods of Social Research:

, ,

Machine Learning

Steve Abney – University of Michigan
Course time: Monday/Wednesday 11:00 am – 12:50 pm
1401 Mason Hall

See Course Description

This course provides a general introduction to machine learning. Unlike results in learnability, which are very abstract and have limited practical consequences, machine learning methods are eminently practical, and provide detailed understanding of the space of possibilities for human language learning.

Machine learning has come to dominate the field of computational linguistics: virtually every problem of language processing is treated as a learning problem.  Machine learning is also making inroads into mainstream linguistics, particularly in the area of phonology. Stochastic Optimality Theory and the use of maximum entropy models for phonotactics may be cited as two examples.

The course will focus on giving a general understanding of how machine learning methods work, in a way that is accessible to linguistics students. There will be some discussion of software, but the focus will be on understanding what the software is doing, not in the details of using a particular package.

The topics to be touched on include classification methods (Naive Bayes, the perceptron, support vector machines, boosting, decision trees, maximum entropy classifiers) and clustering (hierarchical clustering, k-means clustering, the EM algorithm, latent semantic indexing), sequential models (Hidden Markov Models, conditional random fields) and grammatical inference (probabilistic context-free grammars, distributional learning), semisupervised learning (self-training, co-training, spectral methods) and reinforcement learning.

, , ,

Praat Scripting

Kevin McGowan – Rice University
Course time:
Tuesday/Thursday 11:00 am – 12:50 pm, MLB OR
Monday/Wednesday 1:30 pm – 3:20 pm, 2353 Mason Hall

See Course Description

This course introduces basic automation and scripting skills for linguists using Praat. The course will expand upon a basic familiarity with Praat and explore how scripting can help you automate mundane tasks, ensure consistency in your analyses, and provide implicit (and richly-detailed) methodological documentation of your research.  Our main goals will be:

    1.  To expand upon a basic familiarity with Praat by exploring the software’s capabilities and learning the details of its scripting language.

    2.  To practice a set of scripting best practices to help you not only write and maintain your own scripts but evaluate scripts written by others.

The course assumes participants have read and practiced with the Intro from Praat’s help manual. Topics to be covered include:

    o Working with the Objects, Editor, and Picture windows

    o Finding available commands

    o Creating new commands

    o Working with TextGrids

    o Conditionals, flow control, and error handling

    o Using strings, numbers, formulas, arrays, and tables

    o Automating phonetic analysis

    o Testing, adapting, and using scripts from the internet

, , ,

Python 3 for Linguists

Damir Cavar – Eastern Michigan University
Malgorzata E. Cavar – Eastern Michigan University
Course time: Monday/Wednesday 9:00-10:50 am, MLB OR
Tuesday/Thursday 11:00 am – 12:50 pm, 2347 Mason Hall

See Course Description

This course introduces basic programming and scripting skills to linguists using the Python 3 programming language and common development environments. Our main goals are:

- to offer an entry point to programming and computation for humanities students, and whoever is interested

- to do so without requiring any previous computer or IT knowledge (except basic computer experience and common lay-person computer knowledge).

The course covers in eight sessions the interaction with the Python programming environment, an introduction to programming, and an introduction to linguistically relevant text and data processing algorithms, including quantitative and statistical analyses, as well as qualitative and symbolic methods.

Existing Python code libraries and components will be discussed, and practical usage examples given. The emphasis in this course is on being creative with a programming language, and teaching content that is geared towards specific tasks that linguists are confronted with, where computation of large amounts of data or time consuming annotation and data manipulation tasks are necessary. Among the tasks we consider essential are:

- reading text and language data from- and writing to files in various encodings, using different orthographic systems and standards, corpus encoding formats and technologies (e.g. XML),

- generating and processing of word lists, linguistic annotation models, N-gram models, frequency profiles to study quantitative and qualitative aspects of language, for example, variation in language, computational dialectology, similarity or dissimilarity at different linguistic levels,

- symbolic processing of regular grammar rules to be used in finite state automata for processing of phonotactic information or morphology, but also context free grammars and parsers for syntactic analyses, and higher level grammar formalisms, and the use of these grammars and language processing algorithms.

, ,