Big Data Using Hadoop and NoSQL

BIG DATA USING HADOOP & NOSQL

Big Data Training for Developers and Admins. Detailed insight into Big Data and the essential Big Data Frameworks including Cloud and NoSQL Databases like MongoDB and Cassandra for Data Modelling and Real-time Analytics.

COURSE OBJECTIVE

- Introduction to Big Data and Cloud.
- Hadoop Development and Administration.
- NoSQL Databases and using MongoDB and Cassandra for Data Modelling and Real-time Analytics.
- Setting up Big Data Architecture in Cloud.
- Administrating Hadoop Architectures and its integration with NoSQL.
- Performance Tuning and Troubleshooting with proper monitoring tools.

LESSON PLANS

SESSION 1: INTRODUCTION TO BIG DATA
Session Goal:

Defining Big Data.
Characteristics of Big Data.
Volume of Data.
Variety of Data.
Velocity of Data.
Understanding Big Data Analytics.
Understanding Shared Nothing Architecture for Big Data.
Evolution of Big Data.
What Drives Big Data.
Industry Specific Big Data Use Cases.
Understanding Analytics Process Maturity.
Big Data Implementation a Case Study.

SESSION 2: INTRODUCTION TO HADOOP
Session Goal:

A technical overview of Hadoop.
Understanding Configuration Files.
Planning Hadoop Cluster Installation.
Introduction to MapReduce and HDFS.
Setting up Multi Node Hadoop Cluster.
Working with HDFS Command Shell.
Using Administrative HDFS commands.
Understanding logs and directory structures in Hadoop.

SESSION 3: HADOOP ADMINISTRATION
Session Goal:

MultiNode MultiRack Hadoop Configuration.
Configuring High Availability for Failover.
Authentication using Kerberos.
Implementing Service Level Authorization.
Using Hadoop Auth to enable Kerberos SPNEGO authentication for HTTP
Hadoop Cluster Monitoring and Optimization.
Data collection, monitoring and analysis system for large clusters Using Chukwa.
Installing and Setting up Ganglia to monitor MultiNode Hadoop Cluster.
Monitoring Tool Analysis and Using Nagios to configure Alerts.

SESSION 4: HADOOP DEVELOPMENT
Session Goal:

Parallel Processing with MapReduce.
Automating Data Transfer.
Moving Relational Data with Sqoop
Executing Data Flows with Pig describing characteristics of Apache Pig
Structuring Unstructured Data.
Performing ETL with Pig Transforming Data with Relational Operators.
Filtering Data with Pig.
Manipulating Data with Hive Leveraging business advantages of Hive.
Organizing data in Hive Data Warehouse.
Maximizing data layout for performance.
Extracting Business Value with HiveQL Performing joins on unstructured data

Pushing HiveQL to the limit the dataset.

Deploying Hive in production.
Streamlining Storage Management with Hcatalog.
Interacting with Hadoop Data in Real Time Parallel Processing with Impala.
Introducing the Spark Framework.

SESSION 5: INTRODUCTION TO NOSQL
Session Goal:

Defining NoSql Databases.
Comparing NoSql with Traditional RDBMS.
Types of NoSql DataStore.
NoSql Drivers.
NoSql Adoption Case Studies: Google BigTable.
NoSql Adoption Case Studies: Amazon Dynamo.
Working with Cassandra: Data Modelling & Querying.
Working with MongoDB: Data Modelling & Querying.
Identifying the right NoSQL Database.

SESSION 6: CLOUD AND BIG DATA
Session Goal:

Leveraging Global Infrastructure.
Extending On-Premises into the Cloud.
Computing in the Cloud.
Designing Storage Subsystems.
Distributed Environments.
Choosing a Datastore.
Designing Web-Scale Media Hosting.
Event Driven Scaling.
Infrastructure as Code.
Orchestrating Batch Processing.
Setting up Big Data Architecture (Hadoop+NoSQL) on Cloud.

SESSION 7: BIG DATA ANALYTICS USING SOLR AND LUCENE
Session Goal:

Solr Introduction and Architectures.
Solr Indexing.
Solr Querying.
Solr Performance Optimization.
Introducing Lucene: Core components, core indexing and searching classes.
Building a search index and understanding indexing process.
Implementing search in applications.
Analysis process and advanced search technique.
Extending Search.
Basic index operations.
Concurrency, thread safety and locking issues.
Debugging indexing.
Integration of Hadoop and Solr.

Deploying Hive in production.
Streamlining Storage Management with Hcatalog.
Interacting with Hadoop Data in Real Time Parallel Processing with Impala.
Introducing the Spark Framework.

Pushing HiveQL to the limit the dataset.

CASE STUDY AND PROJECTS:
Case studies are Integral part of Training. As part of this course we will ensure you implement Real-time case studies in various domains which includes:

Banking.
Telecom
Ecommerce.
HealthCare.

These case studies will be evaluated by domain experts and you would get an opportunity to get Feedback on the work.

TRAINING FEATURES:
1) Extensive Real Time Live Examples, Projects & POCs for improved practical competency, ensure deployment readiness and implementation.
2) Custom Lab, Software and Environment provided with Real-time Project Simulation.
3) Recorded Videos complemented with corresponding lecture ppts, materials & lab guides. (Provided in the form of MP4 videos, pdf, ppt for offline access as well).
4) Certification and Job-Interview Counselling & Coaching after every training.

BIG DATA USING HADOOP & NOSQL

COURSE OBJECTIVE

﻿LESSON PLANS

LESSON PLANS