Big Data


Training Overview :

Forscher Solutions provides the detailed Bigdata Hadoop training program that delivers the focused, product knowledge, as-well-as the technical and management knowledge requirements to perform day-to-day management operations on Enterprise environments weaved within your own or your customer's organization.

The training is delivered by Subject Matter Experts currently managing, landscapes in real time environments. The course and its contents are developed by subject matter experts and solution Architects, who deliver the solutions to Fortune 500 and Top-Notch MNCs around the globe.

Target Audience and Pre-requistes:

The training course is designed for candidates with minimal knowledge on operating systems, relational Databases and storage management. However the basics of the management of Operating systems, SQL and the storage are also covered under this training program. All trainings are designed and developed from the ground up, by training experts specifically to cater, candidates of all technical back grounds.

Course Duration:

Fast Track - 5 Days for MCSA certified and 10 Days for aspiring engineers with no prior System admin experience

Normal Track - 12 weekends or 2 hours/day * 30 Days

Corporate - Custom configured per Client requirements and modules required

Module 1: Understanding Operating systems, Storage and Networks:

  • Operating Systems and their supported Processors

  • Understanding the processor architectures from Intel, Oracle Sun and IBM perspectives

  • Unix, Linux and Windows: Understanding the Architecture

  • Server components and resource requirements

  • Overview of Virtualization in Servers

  • Virtualizing Hardware built on Software based Data Center.

  • Concepts of High Availability and clustering using Hardware and Software methodology

  • Networking from a CISCO perspective

  • IPv4 and IPv6 explained

  • Overview of Virtualization in Networking

  • Networking Protocols: NFS, CIFS and iSCSI explained and demonstrated

  • Kerberos and security in Hadoop

  • Understanding the need for storage sub-systems

  • SAN, NAS, CAS and DAS explained

  • EMC, Netapp and Fiber Channel Switching technologies briefed

  • Understanding Block and File storage

  • Database management and Querying using SQL

Module 2: Need for Analytics, Bigdata and Hadoop and Open Source Software

  • Need for Analytics by Businesses and Enterprises

  • Common Analytics software used widely in the industry

  • Bigdata and Hadoop: Common Components and Concepts

  • Business case for Bigdata Analytics.

  • Module 3: Understanding the HDFS, YARN and Mapreduce Framework:
    • HDFS Features and Architecture

    • Understanding Data ingestion from External resources and Relational Databases

    • NameNode, NameNode Web UI and Data Node

    • Securing the HDFS

    • Understanding the File Shell in Hadoop

    • Understanding YARN

    • Configuring YARN in Hadoop

    • Concepts, Features and Architecture of MapReduce

    • YARN and Mapreduce

    • Understanding the JobTracker Web UI

    • HDFS on the Web: WebHDFS, Configuration, data copy

    • Administration of WebHCat/WebHDFS

    • Concepts of Rack-Awareness and its associated topology

    Module 4: Design, Plan, Install and configure Hadoop

    • Hadoop Deployment environment considerations - Hardware

    • Network planning for Hadoop installation

    • NFS setup in a Hadoop landscape

    • DHigh Availability and Clustering in Hadoop

    • HDP services management

    • Designing a Hadoop landscape

    • Hadoop Installation and Initial Configuration

    • Installing and configuring Hadoop Clients

    • Understanding and deploying Hadoop via Ambari

    • Yarn, Hive, Sqoop, Flume, Oozie, Impala, and Pig: Concepts and usage

    • Cloudera manager, Architecture, Topology and its integration into a Hadoop Landscape

    • Administration of Cloudera

    • Installation, Configuration and management of Hue

    Module 5: High Availability and Clustering in the Hadoop Environment

    • Understanding HA and clustering in a Hadoop Environment

    • Configuration parameters for HA and Ports in HDP

    • Including and Excluding nodes in a Hadoop Cluster

    • Data transfer between clusters

    • Load balancing and Upgrading Clusters

    • Monitoring the Nodes in HDP Cluster

    • HA and Ambari

    Module 6: Common and Advanced Tools in Integrating a Bigdata Hadoop Solution

    • HIVE: Components, Tables, Servers, Queries, Optimization and Security of HIVE

    • PIG: Data Transformation with PIG in Hadoop

    • SQOOP: Concepts, Tools, Tables and Querying using SQOOP

    • FLUME: Overview, Installation, configuration, events, channels, Sinks and Interceptors

    • OOZIE: Concepts, Components, jobs, workflows and CLI

    Module 7 : Scheduling Jobs in Hadoop

    • Understanding scheduling jobs in a Hadoop environment, based on business case

    • Configuring a Fairscheduler

    • Configuring Capacity Scheduler

    • Configuring Capacity Scheduler

    • Managing the Jobs itself

    Module 7: Backups and Recovery in a Hadoop Environment

    • Setting up a backup solution for the HDFS

    • Understanding snapshots on an active file system

    • Understanding snapshots on an active file system

    • Understanding snapshots on an active file system

    • Understanding snapshots on an active file system

    Module 8: Monitoring a Hadoop landscape

    • Ambari in Monitoring a Hadoop Landscape

    • Ambari: Configuration of a monitoring solutions for a Hadoop clustered landscape

    • Tools for Monitoring: Nagios, Ganglia, Eclipse – Configuration and Monitoring

    Module 9: Security in Hadoop

    • Kerberos and Security in Hadoop

    • Log files for security in Hadoop

    • Authentication and Authorization in Hadoop environment