Big Data Hadoop and Spark Developer

General Information

Fee : 10500/-


Learn the Concepts and implementation of Hadoop and Java programming, and take the first step on your journey to becoming aHadoop Developer!

Expectations and Goals

It is a comprehensive Hadoop Big Data training course designed by industry experts considering current industry job requirements to provide in-depth learning on big data and Hadoop Modules. This is an industry recognized Big Data certification training course that is a combination of the training courses in Hadoop developer, Hadoop administrator, and analytics.

Course Materials

Required Materials

  • Laptop
  • 6+ GB RAM (Recommended 8GB)

Optional Materials

  • Internet Connection

Course Syllabus

Introduction to Big Data & Hadoop and its Ecosystem, Map Reduce and HDFS

What is Big Data, Where does Hadoop fit in, Hadoop Distributed File System – Replications, Block Size, Secondary Namenode, High Availability, Understanding YARN – ResourceManager, NodeManager, Difference between 1.x and 2.x

Hadoop Installation & setup

Hadoop 2.x Cluster Architecture , Federation and High Availability, A Typical Production Cluster setup , Hadoop Cluster Modes, Common Hadoop Shell Commands, Hadoop 2.x Configuration Files, Cloudera Single node cluster

Core Java Fundamentals

Basic Overview, Classes & Objects revisited, Inheritence, Interface

Deep Dive in Mapreduce

How Mapreduce Works, How Reducer works, How Driver works, Combiners, Partitioners, Input Formats, Output Formats, Shuffle and Sort, Mapside Joins, Reduce Side Joins, Distributed Cache.

Linux Fundamentals

Basic Linux commands, understanding the linux environment, exercises for practice

Lab exercises:

Working with HDFS, Writing WordCount Program, Writing custom partitioner, Mapreduce with Combiner , Map Side Join, Reduce Side Joins, Running Mapreduce in LocalJobRunner Mode.

Understanding Pig

A. Introduction to Pig Understanding Apache Pig, the features, various uses and learning to interact with Pig
B. Deploying Pig for data analysis The syntax of Pig Latin, the various definitions, data sort and filter, data types, deploying Pig for ETL, data loading, schema viewing, field definitions, functions commonly used.
C. Pig for complex data processing Various data types including nested and complex, processing data with Pig, grouped data iteration, practical exercise
D. Performing multi-dataset operations Data set joining, data set splitting, various methods for data set combining, set operations, hands-on exercise

Understanding Hive

A. Hive Introduction Understanding Hive, traditional database comparison with Hive, Pig and Hive comparison, storing data in Hive and Hive schema, Hive interaction and various use cases of Hive
B. Hive for relational data analysis Understanding HiveQL, basic syntax, the various tables and databases, data types, data set joining, various built-in functions, deploying Hive queries on scripts, shell.
C. Data management with Hive The various databases, creation of databases, data formats in Hive, data loading, changing databases and Tables, result storing of queries, data access control, managing data with Hive.
D. Hands on Exercises – working with large data sets and extensive querying

Understanding SQOOP

Sqoop Installations and Basics, Importing Data from MySQL to HDFS, Advance Imports, Real Time UseCase, Exporting Data from HDFS to MySQL, Running Sqoop in Cloudera

Understanding Flume

Overview of Apache Flume, Physically distributed Data sources, Changing structure of Data, Closer look, Anatomy of Flume, Core concepts, Event, Clients, Agents, Source, Channels, Sinks, Interceptors, Channel selector, Sink processor,Data ingest, Agent pipeline, Transactional data exchange, Routing and replicating, Why channels?, Use case- Log aggregation, Adding flume agent, Handling a server farm, Data volume per agent, Example describing a single node flume deployment

Introduction to IMPALA & (Avro) Data Formats

A. Introduction to Impala
What is Impala?, How Impala Differs from Hive and Pig, How Impala Differs from Relational Databases, Limitations and Future Directions, Using the Impala Shell
B. Choosing the Best (Hive, Pig, Impala)
C. Modeling and Managing Data with Impala and Hive
Data Storage Overview, Creating Databases and Tables, Loading Data into Tables, HCatalog, Impala Metadata Caching
D. Data Partitioning
Partitioning Overview, Partitioning in Impala and Hive
(AVRO) Data Format
Selecting a File Format, Tool Support for File Formats, Avro Schemas, Using Avro with Hive and Sqoop, Avro Schema Evolution, Compression

Apache HBase

What is Hbase, Where does it fits, What is NOSQL, Hbase Basics & Architecture, Creating Tables, Listing Tables, Enabling & Disabling tables, describe, alter drop tables, Scan, Insert, Update, Read, Delete Data, Scan

Apache Spark

A. Why Spark? Working with Spark and Hadoop Distributed File System
What is Spark, Comparison between Spark and Hadoop, Components of Spark
B. Running Spark on a Cluster, Writing Spark Applications using Java/Scala


ZOOKEEPER Introduction, ZOOKEEPER use cases, ZOOKEEPER Services, ZOOKEEPER data Model


Why Oozie?, Running an example, Oozie- workflow engine, Word count example, Oozie job processing, Job submission

Quiz & Awards


Apply for Big Data Hadoop and Spark Developer