Data Science is the emerging interdisciplinary field that lies at the intersection of computer science, statistics, visualization and the social sciences. Scientific and economic progress is increasingly powered by our capabilities to explore big data sets. Data scientists dig for value in data by analyzing for instance texts, application usage logs, and sensory data. They are the driving force behind the successful innovation of Internet companies like Google, Twitter, and Yahoo. There is an increasing need for data scientists and big data engineers seen in job advertisements. The need for data scientists and big data analysts is apparent in almost every aspect of our society, including computer science, medicine, physics, and the humanities.|
The goal of the course Data Science is to teach several data science skills needed in various phases of data analysis projects. The skills are offered as individual topics of 2.5EC from which the student has to choose at least two. The topics are taught in an assignment & project-driven manner providing a self-study environment with ample guidance and supervision. Each topic consists of one lecture, a practicum for learning the basic skills and a project for deepening and assessment. The practicum and project are done in pairs. Supervision is provided during practicum-sessions twice per week shared with all topics. Each topic is graded individually based on the project. The grade of the course is the average of the grades for the topics.
The list of topics will be revised every year. In the course year 2014/2015, being the first time this course is given, the following topics are offered:
XML Databases (Topic teacher: D. Hiemstra)
XML is an important data model for exchanging and transforming data. This topic teaches the most important standards and tools for (a) publishing XML-data from relational databases with SQL/XML, (b) querying, searching, transforming, and updating XML data with XPath and XQuery, and (c) storing XML-data in relational databases in a scalable way using several techniques.
Representing data semantics with the Semantic Web (Topic teacher: M. van Keulen)
The Semantic Web is a knowledge representation and reasoning standard based on several interrelated technologies. Data published using these technologies is often called “Linked Open Data”. The topic teaches the most important standards and skills for (a) representing data with semantics using RDF and RDFS, (b) querying RDF data with SPARQL, (c) modeling an ontology for sematic annotation, and (d) the description logic theory behind it and its limitations.
Data Warehousing, OLAP, and Data Visualization (Topic teacher: C. Amrit)
Data Warehousing, OLAP and Data Visualization are in essence technologies developed for Business Intelligence. They are, however, also effective for data science. The topic will teach (a) data warehousing techniques for extracting and transforming data (ETL), (b) modeling data for analytic purposes using the multidimensional modeling approach of OLAP, and (c) data visualization techniques.
Data Mining (Topic teacher: M. Poel)
Data mining is about discovering patterns in large data sets involving methods from artificial intelligence, machine learning, statistics, and database systems. The topic teaches (a) classification, (b) clustering, and (c) association rule mining.
Information extraction from text using natural language processing (Topic teacher: M.B. Habib)
Most information is available in a form rather unsuitable for processing by computers, namely natural language text. This topic teaches (a) text mining (analyzing text directly), (b) rule-based techniques for information extraction, and (c) statistical techniques for information extraction and natural language processing.
This topic is preferably done in combination with “Data Mining”.
Voorkennis |Verplicht materiaal-Aanbevolen materiaal|