After completing this course, the student can:
- Indicate the properties of, and the differences between, the main techniques for modelling and parsing natural languages as well as other natural language processing tasks
- Demonstrate understanding of the main linguistic concepts that play a role in natural language processing
- Apply multiple methods and techniques for natural language processing (such as tokenization, statistical language modeling, part-of-speech tagging, text classification, syntactic parsing) to new problems and data
- Use appropriate tools and methods for linguistic data analysis to investigate aspects of language use in textual data (for example from social media)
- Evaluate the performance of various natural language processing methods on tasks such as spelling normalization, syntactic analysis and text classification
This course looks at a specific area of artificial intelligence (AI): how machines can analyse linguistic data, focusing on written language. We look at AI models such as grammatical, statistical and neural models, and how they are used in tools for natural language processing such as part of speech taggers, named entity recognition, sentiment analysis and parsing tools. We also look at how machines can learn to analyse linguistic data. In the lectures we will present computational models of language as well as tools and techniques that can be used to build “natural language processing machines” and to address interesting questions and answers concerning language use. Students will do practical homework assignments in which they will use the models and natural language processing tools and carry out a small project to answer research questions regarding linguistic data. As part of the project, students are also expected to find and read some papers on the current state of the art in natural language processing, related to their project. Assessment is done in the form of an individual written exam, and assignments (four homework assignments and a small project) for pairs of students. Final grading is based on marks for assignments and the exam. The course includes a mandatory project presentation that is not graded. |
Besides dealing with technology for natural (= human) language processing, the course also deals with understanding humans and context, as students learn about the structure and use of human language.
We expect that students have basic programming skills (for example, writing scripts for data analysis). Given the use of formal, mathematical and probabilistic methods in this course, as well as the use of algorithmic specifications based on these models, the course requires some practical experience with and feeling for mathematical formulas and formal specifications.
This course is a desired prerequisite for the courses Speech Processing (201600075) and Conversational Agents (201600077).