• Home
  • Courses
  • Videos
  • Blog
  • Corporate
  • Contact Us

BIG DATA HADOOP ADMINISTRATION

Big Data Hadoop Admin Training course for Administrators and aspiring Administrators. Coverage extended to Big Data on Cloud and DevOps for Hadoop and Cloud.

COURSE OBJECTIVE:
Provide an in-depth understanding of Architecture and implementing Hadoop cluster using CDH and HDP. This course covers Installation, Configuration, Securing, and Monitoring Hadoop Cluster with all ecosystem components to setup production level Big Data management. We also Discuss Data Lake concept with real time use cases of Analytics.

This course is oriented towards Certification and also provides real time implementation to skill participant to work in production.


LESSON PLANS


SESSION 1: UNDERSTANDING BIG DATA FROM ADMINISTRATION PERSPECTIVE
Session Goal:
Introduction to the topology of common existing limitations when dealing with a large amount of data along with the common solutions. The goal here is to lay down the foundation of a heterogeneous architecture that will be described in the following Sessions.
  • Identifying Big Data Symptoms.
  • Understanding the Big Data Projects Ecosystem.
  • Creating the foundation of a long-term Big Data Architecture.
  • Roles and responsibilities of an Administrator.

SESSION 2: HADOOP CORE COMPONENTS: PLANNING, INSTALLATION & CONFIGURATION
Session Goal:
Introduction to the core components of Hadoop like HDFS, MapReduce and YARN. Participants will also learn Planning, Installation and Configuration in various scenarios.
  • Planning Hardware, OS, Software and Network to achieve Architectural requirements.
  • Types of Installation, and Installing Scalable Hadoop Clusters.
  • In-depth understanding of the Configuration Files and the impact of various properties on Hadoop Cluster.
  • Reviews and tests post successful installation.

SESSION 3: MANAGING & TROUBLESHOOTING HDFS
Session Goal:
HDFS is a major component of Hadoop that needs to be understood properly. You will learn various HDFS features like Replication, NameNode and DataNode, etc. Participants will also learn the various ways of managing HDFS when in a bad health and ensure a balanced HDFS is retained in production.
  • Configuring Nodes (NameNode & DataNodes).
  • Important configuration features like Replication.
  • Working with HDFS client to interact with filesystem and perform important tasks.
  • Important HDFS Administration Commands and Utilities along with HDFS Federation.
  • Filesystem Internals and understanding MetaData. 

SESSION 4: UNDERSTANDING LOAD BALANCING & FAILOVER
Session Goal:
Understand the limitations of Hadoop version 1.0 and elaborate on the High Availability (HA) capability that was added in version 2.0. Participants will learn planning and configuring NameNode HA using various methods along with Manual and Automated Failover Configuration using ZooKeeper.
  • Considerations and planning for Highly Available Hadoop.
  • Setting up High Availability.
  • Managing and Administration of Hadoop Clusters in HA Mode.

SESSION 5: MANAGING JOBS (MAPREDUCE & YARN)
Session Goal:
Leveraging the Data Processing capabilities of MapReduce and YARN. Learn to use MapReduce components like JobTracker, TaskTracker and YARN components like ResourceManager, NodeManager, Containers and Application Master.
  • Configure JobTracker and TaskTracker.
  • Monitoring Jobs and Tuning.
  • Understanding YARN Architecture.
  • Configuring YARN and monitoring YARN Task.
  • Capacity Schedulers and Open Source products that runs under YARN.

SESSION 6: SECURITY APPLICATION: AUTHORISATION & AUTHENTICATION
Session Goal:
Securing HDFS and Jobs is one of the important tasks of an Administrator. This session will teach Configuration and Application of Authentication and Authorisation using various Frameworks like Kerberos and Ranger.
  • Understanding Authentication and Authorisation in Hadoop .
  • Concepts of Kerberos and Ranger including Installation and Configuration.
  • Configuring Hadoop Security using Kerberos and Ranger.
  • Auditing.

SESSION 7: HADOOP ECOSYSTEM CONFIGURATION
Session Goal:
​
You may already know by now that Hadoop provides Infrastructure to manage Big Data. There are numerous products designed to utilise Hadoop Infrastructures, provide simplified and enhanced capabilities for Data Processing and Management. Here you will learn to setup and understand use of ecosystem products like Sqoop, Hive, Pig, Zookeeper, HBase, Flume, Mahout and Spark as  Administrators.
  • Setting up Ecosystem products:Sqoop, Hive, Pig, HBase and Spark.
  • Data processing using Pig and Hive.
  • Import/Export of EIS Data using Sqoop.
  • Effective usage of Spark..
  • Implementation scenarios of NoSql and HBase.
  • Basics of Flume and its usages,

SESSION 8: NETWORK CONSIDERATIONS & TOPOLOGIES: RACK CONFIGURATION
Session Goal:
The most important aspect of Distributed System is Network. Hadoop being a Distributed Architecture relies heavily on Network and its reliability. Here you'll learn about various Network Architectures and Considerations that you as an Administrator has to focus on to get a reliable Hadoop Cluster.
  • Basics of Network for Hadoop Administrators.
  • Topologies.
  • Rack Configuration and its impact on configurations like Replication.
  • Monitoring and Tuning Network.

SESSION 9: SETTING UP MONITORING SYSTEMS
Session Goal:
Monitoring is a continuous task. You'll learn and understand Monitoring Parameters and set-up Monitoring Agents like Ganglia and Nagios.
  • Understanding and Planning Monitoring.
  • Installing and Configuring Nagios with Hadoop elements.
  • Ganglia and Hadoop Configuration.
  • Evaluating output of Monitoring and health Diagnosis. 

SESSION 10: TUNING HADOOP CLUSTER & MAINTAINING QOS
Session Goal:
You'll learn fine-tuning Hadoop Cluster to meet the QOS specifications and requirements. You'll also learn about the common issues and the necessary Action Plans to resolve those problems.
  • Tuning common issues in HDFS.
  • Tuning MapReduce and YARN.
  • Tuning JVM.
  • Heap Dump and Thread Dump Analysis.

SESSION 11: EVALUATING HADOOP DISTRIBUTION (CDH, HDP & PIVOTAL)
Session Goal:
As you already are aware, Hadoop is distributed in various flavours. Participants will understand evaluation and capabilities of popular distributions like CDH, HDP and Pivotal HD.
  • Installation of CDH and HDP.
  • Evaluating DashBoard.
  • Configuring and Monitoring using CM and Ambari.
  • Cost factors and derived value from distribution.
  • Partnerships and product comparison.

SESSION 12: CLOUD & BIG DATA INTEGRATION
Session Goal:
Cloud; today, is considered a great alternate option to design Architectures because of the elasticity and resource management capabilities that it offers. Here you'll learn to set up and evaluate Big Data on Cloud Infrastructure.
  • Cloud Basics (AWS, OpenStack and Azure).
  • Setting up Machines and Infrastructure for Hadoop on Cloud.
  • Using Cloud Storage capabilities.
  • Benefits of Cloud Infrastructure vs On-premise Setup.

SESSION 13: HADOOP & DEVOPS
Session Goal:
Here you'll learn the use of DevOps to manage Cloud.
  • Understanding DevOps.
  • Hadoop Ecosystem and DevOps.
  • Working with DevOps Frameworks.
  • Hadoop and DevOps together.

CASE STUDY AND PROJECTS
Case studies are integral part of training. As part of this course we will ensure you  implement Real-time Case studies ​in various domains which includes:
  • Banking.
  • Telecom
  • Ecommerce.
  • HealthCare.​
These case studies will be evaluated by domain experts and you would get an opportunity to get Feedback on the work.
TRAINING FEATURES
1) Extensive Real Time Live Examples, Projects & POCs for improved practical competency, ensure deployment readiness and implementation.
2) Custom Lab, Software and Environment provided with Real-time Project Simulation.
3) Recorded Videos complemented with corresponding lecture ppts, materials & lab guides. (Provided in the form of MP4 videos, pdf, ppt for offline access as well).
4) Certification and Job-Interview Counselling & Coaching after every training.​
ALCHEMY LEARNSOFT
Courses
Videos
Blog
Corporate
CONTACT US
​support@alchemyls.com
​1800-929-7190​
ADDRESS
​​2711, Centerville Road
Suite 400

Wilmington, DE 19808
© 2016 Alchemy LearnSoft. All Rights Reserved.