This is compulsory on MSc Data Engineering.
This module runs every trimester 1 (September).
This module fee is for UK and ROI fee payers only.
Detailed Description
This module will explore and develop data management and processing solutions that will work on dirty, complex, real-world data. This module will examine the key concepts of data warehousing, data cleaning, and data processing in the context of business requirements and focus on how to combine these steps into a coherent data processing pipeline.First, modern tools and techniques in data management will be examined, with the emphasis on good practice and professional approaches of storing and handling data.
Next, the module will examine ways of cleaning noisy real-world data in order to make it suitable for data processing. Finally, data processing and collation techniques such as Machine or Deep Learning will be applied to the data to extract structure and elicit comprehension of the data. Throughout the module, advantages and disadvantages of using local and cloud approaches will be explored, alongside discussing common parallel approaches to facilitate faster solutions.In short, the goal of this module is to allow students to understand a data processing pipeline from raw data to final delivery.
It will cover:
• Data warehousing and storage techniques
• Data cleaning techniques
• A discussion of cloud approaches
• Data processing and collation techniques
• An introduction to parallel data pipeline approaches