This module is only available for students enrolled on to MSc Data Science or MSc Data Engineering.Read More
The challenges of contemporary data acquisition and analysis have been characterised as “the four V’s of Big Data” (volume, variety, velocity and validity). These require the use of specialised data storage, aggregation and processing techniques. This module introduces a range of tools and techniques necessary for working with data in a variety of formats with a view to developing data driven applications. The module focuses primarily on developing applications using the Python scripting language and associated libraries and will also introduce a range of associated data storage and processing technologies and techniques.
The module covers the following topics:
• Data types and formats: numerical and time series, graph, textual, unstructured,
• Data sources and interfaces: open data, APIs, social media, web-based
• NoSQL databases such as document (MongoDB), graph and key value pair
• Techniques for dealing with large data sets, including Map Reduce
• Developing Data Driven Applications in Python