Distributed Systems is the study of systems that continue to work if come of their components fail. This subject is gaining importance as our dependence on large-scale information services grows.
In 1990, an AT&T telephone switch crashed and rebooted, then sent a message to neighboring switches (intended to inform them of the event), that crashed all neighbors and caused them to reboot as well. This cascading failure brought down the entire AT&T telephone network — over and over, for nine hours. On 26 November 2003 Qantas Flight 72, an Airbus 330-301, flying on autopilot, had a pair of sudden uncommanded pitch-down maneuvers that resulted in serious injuries to many of the occupants. The direct cause was a failure in one of three Air Data Inertial Reference Units (i.e., instruments that measure attitude, speed and heading of the aircraft), but that failure should have been detected and ignored because the other two units were working normally. It was the first serious bug discovered in the Airbus fly-by-wire system in millions and millions of hours in operation.
Distributed systems are all around us, in aircraft, factories, banks, reservation systems, etc. Some of these operate on a global scale — the telephone network, the Internet, the Google search engine, Facebook, Twitter — and, as our dependence on information systems grows, so will the importance of distributed systems grow.
Place in the curriculum
Distributed systems are clearly important. They help make sure that the systems we depend on are reliable and available all the time. As such, distributed systems are important to all computer scientists and vital to designers of embedded systems. The course is also an introduction in academic writing and giving presentations. Scientific articles are not like university textbooks; textbooks try to give a balanced overview of a usually mature field. Scientific articles try to sell an idea; they want to convince the reader that, whatever idea is presented, it is even better than the invention of the wheel.
In this course, we'll examine the claims the authors make and we'll see how different articles can appear to be completely convincing, yet completely contradictory.
We'll ask the students to read the articles and present their views. After that, we'll discuss the articles in detail. The course teaches essential skills for anybody with a Master's degree: capability to understand complex material and present it to an audience.
To take the course, students must have a basic understanding of computer organization, operating systems and computer networks. It is Master's course for students of computer science (MSc CSC) and embedded systems (MSc EMSYS).
The course is based on a number of the scholarly articles that helped lay the foundation of the discipline of distributed systems. But we will also study articles that describe modern, very large distributed systems (e.g., Google's BigTable).
Besides scholarly articles on distributed systems, this course also contains the challenge of understanding and presenting material to an audience and discussing scientific work.
The course is given in an unusual form. All students are required to read and understand the articles that make up the course. Moreover, every student will be asked to introduce one or two of the articles to the group, after which a general discussion will take place.
The students are always encouraged to discuss a draft of the presentation with the lecturer beforehand. The idea is to strengthen the presentation and eliminate embarrassing mistakes from being made in public.
Satisfactory presentations and participation in the discussions will automatically result in a passing grade. There will be no exam at the end of the course. Students who have not presented, or who have presented below an acceptable level, can pass the course by writing an essay on an agreed upon distributed-systems subject.