Big Data Training

COURSE ID

GES-BD
DURATION 39 hrs
DELIVERY METHOD Classroom Instructor-led training (CILT)
Online Instructor-led training ( OILT)

RDBMS vs Hadoop

Ecosystem tour (9 products)

Vendor comparison (Cloudera, Hortonworks, MapR, Amazon EMR)

Hardware Recommendations

NameNode and DataNode architecture

Write pipeline

Read pipeline

Heartbeats

Rack awareness

Block scanner

 

JobTracker/TaskTracker architecture

Shuffle: Sort + Partitioning

Speculative Execution

Input/output formats

Distributed cache

Introduction to Hadoop FS and Processing Environment’s UIs

How to read and write files

Basic Unix commands for Hadoop

Hadoop FS shell

Hadoop releases practical

Hadoop daemons practical

Pig Introduction

Why Pig if Map Reduce is there?

How Pig is different from Programming languages

Pig Data flow Introduction

How Schema is optional in Pig

Pig Data types

Pig Commands – Load, Store , Describe , Dump

Map Reduce job started by Pig Commands

Execution plan

Pig- UDFs

Pig Use cases

Pig Assignment

Complex Use cases on Pig

XML Data Processing in Pig

Structured Data processing in Pig

Semi-structured data processing in Pig

Pig Advanced Assignment

Real time scenarios on Pig

When we should use Pig

When we shouldn’t use Pig

Live examples of Pig Use cases

Hive Introduction

Meta storage and meta store

Introduction to Derby Database

Hive Data types

HQL

DDL, DML and sub languages of Hive

Internal , external and Temp tables in Hive

Differentiation between SQL based Datawarehouse and Hive

Hive releases

Why Hive is not best solution for OLTP

OLAP in Hive

Partitioning

Bucketing

Hive Architecture

Thrift Server

Hue Interface for Hive

How to analyze data using Hive script

Differentiation between Hive and Impala

UDFs in Hive

Complex Use cases in Hive

Hive Advanced Assignment

Real time scenarios of Hive

POC on Pig and Hive , With real time data sets and problem statements

How Map Reduce works as Processing Framework

End to End execution flow of Map Reduce job

Different tasks in Map Reduce job

Why Reducer is optional while Mapper is mandatory?

Introduction to Combiner

Introduction to Partitioner

Programming languages for Map Reduce

Why Java is preferred for Map Reduce programming

POC based on Pig, Hive, HDFS, MR

Introduction to NOSQL

Why NOSQL if SQL is in market since several years

Databases in market based on NOSQL

CAP Theorem

ACID Vs. CAP

OLTP Solutions with different capabilities

Which Nosql based solution is capable to handle specific requirements

Examples of companies like Google, Facebook, Amazon, and other clients who are using NOSQL based databases

HBase Architecture of column families

How to work on Map Reduce in real time

Map Reduce complex scenarios

Introduction to HBase

Introduction to other NOSQL based data models

Drawbacks of Hadoop

Why Hadoop can’t work for real time processing

How HBase or other NOSQL based tools made real time processing possible on the top of Hadoop

HBase table and column family structure

HBase versioning concept

HBase flexible schema

HBase Advanced

Introduction to Zookeeper

How Zookeeper helps in Hadoop Ecosystem

How to load data from Relational storage in Hadoop

Sqoop basics

Sqoop practical implementation

Sqoop alternative

Sqoop connector

Quick revision of previous classes to fill the gap in your understanding and correct understandings

How to load data in Hadoop that is coming from web server or other storage without fixed schema

How to load unstructured and semi structured data in Hadoop

Introduction to Flume

Hands-on on Flume

How to load Twitter data in HDFS using Hadoop

Introduction to Oozie

How to schedule jobs using Oozie

What kind of jobs can be scheduled using Oozie

How to schedule jobs which are time based

Hadoop releases

From where to get Hadoop and other components to install

Introduction to YARN

Significance of YARN

 

Introduction to Hue

How Hue is used in real time

Hue Use cases

Real time Hadoop usage

Real time cluster introduction

Hadoop Release 1 vs Hadoop Release 2 in real time

Hadoop real time project

Major POC based on combination of several tools of Hadoop Ecosystem

Comparison between Pig and Hive real time scenarios

Real time problems and frequently faced errors with solution

 

Introduction to Spark

Introduction to scala

Basics Features of SPARK and Scala available in Hue

Why Spark demand is increasing in market

How can we use Spark with Hadoop Eco System

Datasets for practice purpose

 

Spark use cases with real time scenarios

Spark Practical with advanced concepts

Scala platform with complex use cases

Real time project use cases examples based on Spark and Scala
How we can reduce

Recent comments

Soumya

"I would like to thank Global erp solutions Trainers for enhancing my technical knowledge which help to boosted up my careers and confidence for guiding me throughout my training. The training was superb that helped me upgrade my knowledge & technical skills .
I assured sincerely refer to My friends.."

Amina

"Thanks for giving excellent training for big data training. "

Harry

"Well qualified trainers with Global ERP Solutions, they have done outstanding job. Great job!"