Hadoop Development

BIG DATA HADOOP DEVELOPMENT

Big Data Hadoop Development course providing in-depth coverage of MapReduce and Yarn development along with Spark Framework & NoSQL. R Language for Data Analytics and Solr as the Enterprise Search Application will also be introduced in this module.

COURSE OBJECTIVE
With the emergence of a huge market need of skilled manpower in Big Data Analytics and developers we have designed this course to develop development skills on Big Data technologies. This course covers all ecosystem products used in ingestion, cleansing and getting value out of Big Data using best practices applied in handling Big Data.

We cover the course with real time use cases and implementations from various domains which includes predictive analytics and Data Science algorithms.

LESSON PLANS

SESSION 1: UNDERSTANDING THE BIG DATA PROBLEM
Session Goal:
Introduction to the topology of common existing limitations when dealing with a large amount of data along with the common solutions. The goal here is to lay down the foundation of a heterogeneous architecture that will be described in the following Sessions.

Identifying Big Data Symptoms.
Understanding the Big Data Projects Ecosystem.
Creating the foundation of a long-term Big Data Architecture.

SESSION 2: UNDERSTANDING DEVELOPMENT CHALLENGES
Session Goal:
IT has become a lot more important for many organisations than it was before. With the appearance of the Internet of Things and the Industrial internet, all the unconnected devices will become datafied and start generating vast amount of data. We will now learn about the challenges involved in Big Data Development.

Big Data and IT Dependency.
Determining Big Data Use Case.
ROI and Big Data.
Data Privacy Challenges in Big Data.

SESSION 3: PROGRAMMING MAPREDUCE JOBS & YARN DEVELOPMENT
Session Goal:
Implementing the components of MapReduce and YARN to derive an approach for determining logic and patterns to implement various Data Processing Methodologies.

MapReduce Programming Technique.
Writing Applications using MapReduce.
Serialization and Data formats.
YARN and Data Processing.
Using Combiners and Design Patterns.

SESSION 4: DATA PROCESSING USING PIG
Session Goal:
Understanding Pig Latin Language and its implementation.
Pig simplifies the complexity of MapReduce by providing a language called Pig Latin and Interpreter called Pig to translate Pig Scripts into MapReduce.

Fundamentals of Pig Language, Data Type and Expressions.
Loading and Managing Data in Pig.
User defined function and extending Pig.
Complex Pig use cases and Implementation.

SESSION 5: DEVELOPMENT USING HIVE & IMPALA
Session Goal:
Facebook provided a simplified Data Processing Technique which is similar to SQL. Here you'll learn about the fundamentals and advanced concept of Hive along with Impala. You'll also learn about the critical comparison among them.

Understanding Hive Architecture.
Creating Tables and loading Data in Hive.
Querying Data with Hive.
Understanding Bucket and Partitions and their Implementation in Hive.
Loading Data from RDBMS Tables to Hive tables.
Best Practice and Patterns in Hive.

SESSION 6: PROCESS CHOREOGRAPHY USING OOZIE
Session Goal:
Understanding the need for combining Data Processing Jobs and the role Oozie plays choreographing the process. You'll also understand how collection of actions are arranged in a Controlled Dependency like the Direct Acyclic graphs.

Understanding the Components and Architecture of Oozie.
Job types and building an Oozie Workflow.
Managing Workflow.

End to End Implementation of Complex Workflow using Oozie.
Oozie Console & Job Execution Model
Accessing Oozie Integration with Hive
Variables in Workflows and capturing the output to Variables.
Oozie actions with Hive Thrift Services.

SESSION 7: DEVELOPMENT USING SPARK FRAMEWORK
Session Goal:
Understanding the Architecture of Spark and its relevance in processing Big Data. You'll learn about the core of Spark along with the components of Spark like Spark Streaming, Spark SQL, Spark Mlib and Graphx with in-depth coverage of its implementation using Scala.

Understanding Scala and its features.
Understanding Spark Approach.
In-depth RDD.
Implementing real-time analytics using Spark Streaming.
Spark Machine Learning Implementation
In-depth Graphx.
Benefits of using Spark and Hadoop Architectures.

SESSION 8: DATA MODELLING USING NOSQL
Session Goal:
NoSql is an important approach to overcome the problems and restrictions of scalability in RDBMS. This session will help you understand Storage Architecture properly, CRUID Operations and Querying NoSql Stores.

Modelling Techniques: Key-value, Column Oriented and Graph based.
NoSql products and selecting flavors with the right fit.
Performing CRUID Operations.
Querying NoSql Stores.
Indexing and Ordering DataSets.

SESSION 9: UNDERSTANDING DATA SCIENCE & R LANGUAGE
Session Goal:
This special session will help Developers understand Analytics and the various approach to Analytics on Big Data.

Understanding Data Science.
Data Cleaning, Imputation and Outliers.
Understanding R and working with R.
Basics & Advanced R.

SESSION 10: BIG DATA ANALYTICS & TECHNIQUES OF IMPLEMENTATION USING R
Session Goal
Analytics requires proper understanding of Statistics and Maths. Here you'll learn to implement various algorithms using R and other Tools.

Understanding Descriptive Statistics: Case Study.
Understanding Regressive Analysis: Case Study.
Understanding Correlation Analysis: Case Study.
Web Analytics and Reporting: Case Study.
Time Series Analytics: Case Study.

SESSION 11: IMPLEMENTING SOLR IN BIG DATA
Session Goal:
Data is growing exponentially in this age of advancement and innovation, and handling such massive data demands great focus on the development of Scalable Search Engines.
Here you'll learn and understand Solr; which is an open source Enterprise Search Application that provides the capability of implementing and executing search functionality on Structured and Unstructured Data .

Understanding the problems in Traditional Search Implementation.
Install and Configure Solr Instance and Define Schema.
Data Loading, Query Parsers, Response Writer & Index Handler: Case Study.
Integrating Solr with other technologies.

CASE STUDY AND PROJECTS:
Case studies are Integral part of Training. As part of this course we will ensure you implement Real-time case studies in various domains which includes:

Banking.
Telecom
Ecommerce.
HealthCare.

These case studies will be evaluated by domain experts and you would get an opportunity to get Feedback on the work.

TRAINING FEATURES:
1) Extensive Real Time Live Examples, Projects & POCs for improved practical competency, ensure deployment readiness and implementation.
2) Custom Lab, Software and Environment provided with Real-time Project Simulation.
3) Recorded Videos complemented with corresponding lecture ppts, materials & lab guides. (Provided in the form of MP4 videos, pdf, ppt for offline access as well).
4) Certification and Job-Interview Counselling & Coaching after every training.