Big Data Hadoop and Spark Developer
Fee : 10500/-
Learn the Concepts and implementation of Hadoop and Java programming, and take the first step on your journey to becoming aHadoop Developer!
Expectations and Goals
It is a comprehensive Hadoop Big Data training course designed by industry experts considering current industry job requirements to provide in-depth learning on big data and Hadoop Modules. This is an industry recognized Big Data certification training course that is a combination of the training courses in Hadoop developer, Hadoop administrator, and analytics.
- 6+ GB RAM (Recommended 8GB)
- Internet Connection
Introduction to Big Data & Hadoop and its Ecosystem, Map Reduce and HDFS
What is Big Data, Where does Hadoop fit in, Hadoop Distributed File System – Replications, Block Size, Secondary Namenode, High Availability, Understanding YARN – ResourceManager, NodeManager, Difference between 1.x and 2.x
Hadoop Installation & setup
Hadoop 2.x Cluster Architecture , Federation and High Availability, A Typical Production Cluster setup , Hadoop Cluster Modes, Common Hadoop Shell Commands, Hadoop 2.x Configuration Files, Cloudera Single node cluster
Core Java Fundamentals
Basic Overview, Classes & Objects revisited, Inheritence, Interface
Deep Dive in Mapreduce
How Mapreduce Works, How Reducer works, How Driver works, Combiners, Partitioners, Input Formats, Output Formats, Shuffle and Sort, Mapside Joins, Reduce Side Joins, Distributed Cache.
Basic Linux commands, understanding the linux environment, exercises for practice
Working with HDFS, Writing WordCount Program, Writing custom partitioner, Mapreduce with Combiner , Map Side Join, Reduce Side Joins, Running Mapreduce in LocalJobRunner Mode.
A. Introduction to Pig Understanding Apache Pig, the features, various uses and learning to interact with Pig
B. Deploying Pig for data analysis The syntax of Pig Latin, the various definitions, data sort and filter, data types, deploying Pig for ETL, data loading, schema viewing, field definitions, functions commonly used.
C. Pig for complex data processing Various data types including nested and complex, processing data with Pig, grouped data iteration, practical exercise
D. Performing multi-dataset operations Data set joining, data set splitting, various methods for data set combining, set operations, hands-on exercise
A. Hive Introduction Understanding Hive, traditional database comparison with Hive, Pig and Hive comparison, storing data in Hive and Hive schema, Hive interaction and various use cases of Hive
B. Hive for relational data analysis Understanding HiveQL, basic syntax, the various tables and databases, data types, data set joining, various built-in functions, deploying Hive queries on scripts, shell.
C. Data management with Hive The various databases, creation of databases, data formats in Hive, data loading, changing databases and Tables, result storing of queries, data access control, managing data with Hive.
D. Hands on Exercises – working with large data sets and extensive querying
Sqoop Installations and Basics, Importing Data from MySQL to HDFS, Advance Imports, Real Time UseCase, Exporting Data from HDFS to MySQL, Running Sqoop in Cloudera
Overview of Apache Flume, Physically distributed Data sources, Changing structure of Data, Closer look, Anatomy of Flume, Core concepts, Event, Clients, Agents, Source, Channels, Sinks, Interceptors, Channel selector, Sink processor,Data ingest, Agent pipeline, Transactional data exchange, Routing and replicating, Why channels?, Use case- Log aggregation, Adding flume agent, Handling a server farm, Data volume per agent, Example describing a single node flume deployment
Introduction to IMPALA & (Avro) Data Formats
A. Introduction to Impala
What is Impala?, How Impala Differs from Hive and Pig, How Impala Differs from Relational Databases, Limitations and Future Directions, Using the Impala Shell
B. Choosing the Best (Hive, Pig, Impala)
C. Modeling and Managing Data with Impala and Hive
Data Storage Overview, Creating Databases and Tables, Loading Data into Tables, HCatalog, Impala Metadata Caching
D. Data Partitioning
Partitioning Overview, Partitioning in Impala and Hive
(AVRO) Data Format
Selecting a File Format, Tool Support for File Formats, Avro Schemas, Using Avro with Hive and Sqoop, Avro Schema Evolution, Compression
What is Hbase, Where does it fits, What is NOSQL, Hbase Basics & Architecture, Creating Tables, Listing Tables, Enabling & Disabling tables, describe, alter drop tables, Scan, Insert, Update, Read, Delete Data, Scan
A. Why Spark? Working with Spark and Hadoop Distributed File System
What is Spark, Comparison between Spark and Hadoop, Components of Spark
B. Running Spark on a Cluster, Writing Spark Applications using Java/Scala
ZOOKEEPER Introduction, ZOOKEEPER use cases, ZOOKEEPER Services, ZOOKEEPER data Model
Why Oozie?, Running an example, Oozie- workflow engine, Word count example, Oozie job processing, Job submission