Big Data Track

In this track, you will gain expertise in Data Science and learn mathematical and scientific computing, data manipulation and data visualization. This track covers the key components of the Big Data ecosystem such as Yarn, MapReduce, HDFS, Pig, Hive, HBase and Apache Spark. It covers all the essentials of the field and provides plenty of hands-on experience.

Includes:

  • 10 hours of lecture Videos
  • 153 hands-on practice exercises
  • 22 Assessment exercises
  • 305 knowledge based questions
  • 10 Live connect sessions
             (Master classes)
  • Lifetime access
Contact Us
+91 95669 33778

Big Data and Hadoop for Absolute Beginners

ABOUT THE COURSE

This course helps you to understand the detailed introduction about the tools used to process Big Data, storage in HDFS and retrieval using HBase, Resource allocation by YARN Customizing, Testing and Debugging MapReduce.

COURSE OBJECTIVES

Upon successful completion of the course, the learner will be able to :
  • Learn about Big Data, storing and retrieval of Big Data in HDFS and HBase.
  • Learn about processing, testing and debugging the Big Data using Map Reduce.
  • Learn about Resource allocation using YARN

Course Content

Introduction to Big Data

In this module, you will learn about the list of Big Data elements, various types of Big Data and the importance of structuring data. You will be learning the usage of Big Data across industries, career opportunities in Big Data, significance of Social Media data in business context, application of Big Data for fraud management in the financial sector, fraud management in Insurance using Big Data, application of Big Data in Retail Industry and the concept of distributed computing in relation to Big Data.

  • 1 Video
  • 2 Hours

Understanding the Hadoop 2 Ecosystem

In this module, you will learn about the various components of the Hadoop 2 Ecosystem, process of storing files in Hadoop Distributed File System (HDFS), role of Hadoop MapReduce and the process of storing data with HBase. You will be learning how Hive aids mining Big Data, roles of various components of Hadoop ecosystem such as Zookeeper, Sqoop, Oozie and Flume, role of map and reduce in MapReduce, techniques to optimize MapReduce tasks, roles HBase and Hive play in processing of Big Data and some applications of MapReduce.

  • 1 Video
  • 1 Hours

Storing data in Hadoop 2 - HDFS and HBase

In this module, you will learn about the Hadoop Distributed File System (HDFS), how to work with HDFS files, the role of HDFS Federation, the architecture and the role of HBase. You will be learning the characteristics of HBase schema design, how to implement basic programming for Hbase, the best capabilities of HBase and HDFS for effective data storage.

  • 1 Video
  • 8 Hours
  • 10 Problems

Working with MapReduce on YARN

In this module, you will learn about the MapReduce 2 framework, how to apply the steps to build and execute a basic MapReduce on YARN program, how to apply various techniques for designing MapReduce implementation, process of building joins with MapReduce and the techniques to build iterative MapReduce applications.

  • 1 Video
  • 14 Hours
  • 26 Problems

Customizing MapReduce fundamentals

In this module, you will learn about the implement controlling of MapReduce execution with InputFormat, implement reading data with custom RecordReader, organize output data with custom OutputFormats, how to write data with custom RecordWriter, how to optimize MapReduce execution with a combiner and the implement controlling reducer execution with partitioners.

  • 1 Video
  • 8 Hours
  • 9 Problems

Working with YARN

In this module, you will learn about the advantages of YARN over MapReduce in Hadoop 1.0, YARN ecosystem, YARN architecture, key concepts of YARN API and the schedule jobs with YARN.

  • 1 Video
  • 2 Hours

Hadoop Hive and Pig Online Course

ABOUT THE COURSE

In this course, we will discuss the Hive data storage principle, performing operations with data in Hive, implementing Advance Query features of the Hive, the File formats and Record formats supported by the Hive environment Use Pig to automate the design and implementation of MapReduce applications

COURSE OBJECTIVES

Upon successful completion of the course, the learner will be able to :
  • Create data processing pipelines with Pig
  • Create and query a Big Data warehouse with Hive

Course Content

Exploring Hive

In this module, you will be exposed to an overview of the role of Hive, how to install and configure Hive, the supported schema types, data types, metadata, and partitions, listing various Hive built-in functions, performing Hive commands for data definition language, performing the HQL commands to execute data modification language and queries.

  • 1 Video
  • 1 Hours
  • 17 Problems

Advanced querying with Hive

In this module, you will be exposed to use HQL commands to perform DML queries, implement joins in Hive, using some Hive best practices, performing performance-tuning and query optimization in Hive, various execution types in Hive, various Hive files, formats and security in Hive.

  • 1 Video
  • 1 Hours
  • 20 Problems

Analyzing Data with Pig

In this module, you will be exposed to the features and advantages of Pig, how to install and run Pig, the properties of Pig, using Pig Latin statements and functions and using relational operators in Pig.

  • 1 Video
  • 1 Hours
  • 14 Problems

Spark with Scala - Hands On with Big Data!

ABOUT THE COURSE

This course helps you to understand the Spark program flow, basic Scala constructs, RDD operations, querying data using Spark SQL and Spark Streaming to initialize, transform, deploy and monitor applications.

COURSE OBJECTIVES

Upon successful completion of the course, the learner will be able to :
  • Learn about Spark and Scala, and the program flow of Spark.
  • Learn about loading and storing data in various formats using RDD operations
  • Learn about querying using Spark SQL
  • Learn about Spark Streaming to initialize, transform, deploy and monitor applications

Course Content

Spark and Scala Fundamentals

In this module, you will be able to understand to the difference between Spark and Hadoop framework, key components of Spark ecosystem, Spark program flow, how to work with basic Scala constructs and building programs in Spark.

  • 1 Video
  • 7 Hours
  • 20 Problems

Spark Programming

In this module, you will be able to understand the creation and performance of RDD operations, how to pass functions to Spark, perform transformations and actions on RDD, how to work with key/value pairs and how to load and save data in various formats.

  • 1 Video
  • 8 Hours
  • 20 Problems

Spark SQL

In this module, you will be able to understand the use of SchemaRDD in Spark programs, how to learn and query data with Apache Hive and JSON support, how to use Spark SQL JDBC server to run Spark SQL, how to use Spark SQL UDFs and Hive UDFs and Fine-Tune Spark SQL Performance.

  • 1 Video
  • 2 Hours
  • 10 Problems

Spark Streaming

In this module, you will be able to understand spark Streaming architecture and the concept of linking, how to initialize StreamingContext, input DStreams and receivers, various transformations on DStreams, how to deploy Spark streaming applications and Monitor streaming applications.

  • 1 Video
  • 2 Hours
  • 10 Problems

Comprehensive Course on Hadoop Analytic Tool

ABOUT THE COURSE

In this course, we will learn about the additional Hadoop Tools like, Oozie, Zookeeper, Sqoop, Flume, Yarn and Storm. We would also see the automated data processing with Oozie, distributed process coordination with Zookeeper and efficiently transferring bulk data using Scoop and Flume.

COURSE OBJECTIVES

Upon successful completion of the course, the learner will be able to :
  • Learn about the additional Hadoop tools like, Oozie, Zookeeper, Sqoop, Flume, Yarn and Storm.
  • Learn about automating the data Processing with Oozie.
  • Learn about coordinating the distributed processing with Zookeeper
  • Learn about Efficiently transferring Data using Scoop and Flume

Course Content

Automated Data Processing with oozie

In this module, you will be able to understand the fundamentals, workflow, and components of Oozie, Oozie Workflow, Oozie Coordinator, Oozie Bundle, the overall Oozie execution model, how to access Oozie Server and the Oozie support for Service Level Agreements.

  • 1 Video
  • 2 Hours

Using Oozie

In this module, you will be able to design an Oozie application, how to implement Oozie Workflows, Oozie Coordinator applications, an Oozie Bundle and how to deploy, test, and execute Oozie applications.

  • 1 Video
  • 2 Hours

Distributed process coordination with Zookeeper

In this module, you will be able to understand the role and benefits of Apache ZooKeeper, some terms related to ZooKeeper, use of the ZooKeeper command line interface, how to install and run ZooKeeper, popular ZooKeeper applications and how to build applications using ZooKeeper.

  • 1 Video
  • 2 Hours

Efficiently transferring Bulk Data using Sqoop

In this module, you will be able to understand the basics of using Sqoop and Sqoop 2, steps to import data into Hive and HBase, steps to export data from HDFS, use of drivers and connectors in Sqoop, Sqoop architecture and list the challenges of Sqoop and the advantages of Sqoop 2 over Sqoop1.

  • 1 Video
  • 2 Hours

Flume

In this module, you will be able to understand the architecture of Flume, use of Flume configuration file and how to install, configure, and build Flume for data aggregation.

  • 1 Video
  • 2 Hours

About E-Box

E-Box is a Technology Enabled Active Learning and
Assessment platform for technology and engineering
domains apart from the basic LMS components like
quizzes, assignments, lesson components.

Connect with us

E-Box Google Playstore