MODULE FOR THE STUDY OF BIG DATA
The ‘Big Data’ module deals with the study of the processing and analysis of large amounts of data in the context of Data Science.
Big Data differs from traditional data collections in several characteristics: the amount of data, the fact that data is generally unstructured because it comes from different sources and forms and, in the case of real-time streaming, the speed with which the data arrives.
New technologies have been introduced in Data Science that deal with the management and analysis of large data, overcoming the limitations of traditional data management systems such as relational DBMS (Database Management System).
The ‘Big Data’ module uses Apache Spark, an open-source framework that supports in-memory parallel computing to optimize the performance of applications that analyse Big Data.
It is used by a great many organizations around the world, including IBM, NASA, Samsung and Yahoo!, and its use is constantly expanding.