Subjects

Back

Big Data Processing

Sub No. : AI60004

LTP: 3-0-0

Credits: 3

Prerequisite : None

Taught by : Jiaul Paik

Teaching Assistents :

Atif Hassan, Animesh Sachan, Anupam Borthakur, Abhishek Kumar, Dependra Tiwari, Gaurav Jha, Piranav, Anmol Kumar

The course on 'Big Data Processing' gives a comprehensive introduction to storing and processing `big data’ using modern big data systems such as Map-reduce and Spark that run on large commodity clusters. The primary focus is on algorithm design and programming at `scale’ applied to all major domains: text, graph, streaming and relational data. The course also introduces scalable machine learning algorithms using Spark.

Storing big data: Distributed file system, Apache HBase

Programming on Scale-up and Scale-out Architectures

Cluster organization, Cluster managers, Functional programming with Python and Scala, Simplified Data Processing on Large Clusters using Map-reduce, Map-reduce Algorithm design, Fast Data Processing on Large Clusters with Spark, Resilient Distributed Datasets, Graph data processing with GraphX, Processing Stream data with Spark Streaming, Distributed Machine Learning Algorithms.