Welcome to the Programming for Big Data module.

This is an optional module on the MSc in Computing, for the tracks on Advanced Software Development and Data Science.

The module is divided into 3 main components .

1. Hadoop and MapReduce

2. Programming with Spark

3. Streaming data with Kafka

Each class will be a mixture of lecture, in-class exercises, research, independent learning, etc.

It is expected that students can work independently and have the necessary programming (Java etc) and technical skills (working with Virtual Machines, Linux, etc) for this module.

Make sure to check out BrightSpace for links to all the materials and assessments for this module.

Module Assessments

The module is 100% continuous assessment. This means there is no exam.  But there is a lot of work for class exercises and there will be independent assessments for each component of the module => lots of assignment work.

  • Hadoop & MapReduce assessment = 40%
  • Spark assessment = 40%
  • Quiz covering 3 topic areas – more details at end of semester = 20%

Module Overview & Admin Notes

Module Pre-requisites

What to do before the first class

Install VirtualBox software.

Download the pre-build Virtual Machine (VM).  I will show you how to install and use this VM during the First Week class.

This is an 8Gb download, plus extra space for VM.  You will need a minimum of 2GM RAM available to run the VM.

Docker: If you like working with Docker, try out the pre-built Docker images on the Docker Hub Store.