Kies de Nederlandse taal
Course module: 201200044
Managing Big Data
Course infoSchedule
Course module201200044
Credits (ECTS)5
Course typeCourse
Language of instructionEnglish
Contact D. Hiemstra
dr. R.B.N. Aly
Contactperson for the course D. Hiemstra
Lecturer D. Hiemstra
Academic year2014
Starting block
Application procedureYou apply via OSIRIS Student
Registration using OSIRISYes
After successful completion of the course, the student is able to:
  • Design large scale storage, data-intensive web applications (for instance GMail, Facebook);
  • Specify complex problems as MapReduce algorithms using a programming language with functional constructs (for instance Haskell, Python);
  • Write complex analytical queries using query languages such as Pig Latin and Sawzall;
  • Implement and run solutions using the Hadoop framework.
Big data is a term introduced in the early 2000's to refer to data sets whose size grew beyond the abilityof the software tools of that time to process, typically in the order of many terabytes or petabytes for a single dataset. Big data sets are encountered by software architects in for instance web search and social media, by scientists in for instance meteorology and genomics, and by analysts in for instance finance and business informatics. The course will closely follow developments to manage big data on large clusters of commodity machines, initiated by Google, and followed by many other web companies such as Yahoo, Amazon, AOL, Facebook, Hyves, Spotify, Twitter, etc. Big data gives rise to a redesign of many core computer science concepts: The course discusses file systems (Google FS), programming paradigms (MapReduce), programming languages and query languages (for instance Sawzall and Pig Latin), and 'noSQL' database paradigms (for instance BigTable and Dynamo) for managing big data.

The course consists of lectures and practical assignments. Students will solve real world, largescale
problems as lab exercises, and they get the opportunity to access the University of Twente
PRISMA-2 computer, a 32 node data center sponsored by Yahoo Research. Examples of lab exercises
are: Counting words in large web crawls, inverted index construction, the computation of Google's
PageRank, Analyzing NetFlow logs, and designing a storage layer for the GMail clone HMail.
Introductory course in Databases (192110741 or an equivalent).
Required materials
Recommended materials
Instructional modes
Presence dutyYes


Individual assignments

Kies de Nederlandse taal