Back

Big Data Processing

Sub No. : AI60004
LTP: 3-0-0
Credits: 3

Prerequisite : None

Taught by : Jiaul Paik

Teaching Assistents :

Atif Hassan, Animesh Sachan, Anupam Borthakur, Abhishek Kumar, Dependra Tiwari, Gaurav Jha, Piranav, Anmol Kumar


The course on 'Big Data Processing' gives a comprehensive introduction to storing and processing `big data’ using modern big data systems such as Map-reduce and Spark that run on large commodity clusters. The primary focus is on algorithm design and programming at `scale’ applied to all major domains: text, graph, streaming and relational data. The course also introduces scalable machine learning algorithms using Spark.

Storing big data: Distributed file system,  Apache HBase
Programming on Scale-up and Scale-out Architectures 
Cluster organization, Cluster managers,  Functional programming with Python and Scala,  Simplified Data Processing on Large Clusters using Map-reduce, Map-reduce Algorithm design, Fast Data Processing on Large Clusters with Spark, Resilient Distributed Datasets,  Graph data processing with GraphX, Processing Stream data with Spark Streaming, Distributed Machine Learning Algorithms.