fb

Big Data Track

In this track, you will gain expertise in Data Science and learn mathematical and scientific computing, data manipulation and data visualization. This track covers the key components of the Big Data ecosystem such as Yarn, MapReduce, HDFS, Pig, Hive, HBase and Apache Spark. It covers all the essentials of the field and provides plenty of hands-on experience.

Includes:

  • 10 hours of lecture Videos
  • 153 hands-on practice exercises
  • 22 Assessment exercises
  • 305 knowledge based questions
  • 10 Live connect sessions
             (Master classes)
  • Lifetime access
Contact Us
+91 95669 33778

Big Data and Hadoop for Absolute Beginners

ABOUT THE COURSE

This course helps you to understand the detailed introduction about the tools used to process Big Data, storage in HDFS and retrieval using HBase, Resource allocation by YARN Customizing, Testing and Debugging MapReduce.

COURSE OBJECTIVES

Upon successful completion of the course, the learner will be able to :
  • Learn about Big Data, storing and retrieval of Big Data in HDFS and HBase.
  • Learn about processing, testing and debugging the Big Data using Map Reduce.
  • Learn about Resource allocation using YARN

Course Content

Introduction to Big Data

In this module, you will learn about the list of Big Data elements, various types of Big Data and the importance of structuring data. You will be learning the usage of Big Data across industries, career opportunities in Big Data, significance of Social Media data in business context, application of Big Data for fraud management in the financial sector, fraud management in Insurance using Big Data, application of Big Data in Retail Industry and the concept of distributed computing in relation to Big Data.

  • 1 Video
  • 2 Hours
  • 40 Problems

Understanding the Hadoop 2 Ecosystem

In this module, you will learn about the various components of the Hadoop 2 Ecosystem, process of storing files in Hadoop Distributed File System (HDFS), role of Hadoop MapReduce and the process of storing data with HBase. You will be learning how Hive aids mining Big Data, roles of various components of Hadoop ecosystem such as Zookeeper, Sqoop, Oozie and Flume, role of map and reduce in MapReduce, techniques to optimize MapReduce tasks, roles HBase and Hive play in processing of Big Data and some applications of MapReduce.

  • 1 Video
  • 2 Hours
  • 40 Problems

Storing data in Hadoop 2 - HDFS and HBase

In this module, you will learn about the Hadoop Distributed File System (HDFS), how to work with HDFS files, the role of HDFS Federation, the architecture and the role of HBase. You will be learning the characteristics of HBase schema design, how to implement basic programming for Hbase, the best capabilities of HBase and HDFS for effective data storage.

  • 1 Video
  • 8 Hours
  • 75 Problems

Working with MapReduce on YARN

In this module, you will learn about the MapReduce 2 framework, how to apply the steps to build and execute a basic MapReduce on YARN program, how to apply various techniques for designing MapReduce implementation, process of building joins with MapReduce and the techniques to build iterative MapReduce applications.

  • 1 Video
  • 7 Hours
  • 51 Problems

Customizing MapReduce fundamentals

In this module, you will learn about the implement controlling of MapReduce execution with InputFormat, implement reading data with custom RecordReader, organize output data with custom OutputFormats, how to write data with custom RecordWriter, how to optimize MapReduce execution with a combiner and the implement controlling reducer execution with partitioners.

  • 1 Video
  • 3 Hours
  • 34 Problems

Testing and Debugging MapReduce Applications

Module description In this module, you will perform unit testing of MapReduce applications using MRUnit, perform local testing of MapReduce applications and use logging for Hadoop testing.

  • 1 Video
  • 2 Hours
  • 30 Problems

Working with YARN

In this module, you will learn about the advantages of YARN over MapReduce in Hadoop 1.0, YARN ecosystem, YARN architecture, key concepts of YARN API and the schedule jobs with YARN.

  • 1 Video
  • 2 Hours
  • 25 Problems

Hadoop Hive and Pig Online Course

ABOUT THE COURSE

In this course, we will discuss the Hive data storage principle, performing operations with data in Hive, implementing Advance Query features of the Hive, the File formats and Record formats supported by the Hive environment Use Pig to automate the design and implementation of MapReduce applications

COURSE OBJECTIVES

Upon successful completion of the course, the learner will be able to :
  • Create data processing pipelines with Pig
  • Create and query a Big Data warehouse with Hive

Course Content

What is Hive? Architecture & Modes / Hive installation

What is Hive? Architecture & Modes / Hive installation In this module, you will be exposed to an overview of the role of Hive, architecture of Hive and how to install and configure Hive.

  • 1 Video
  • 1 Hours
  • 21 Problems

Hive Data Types / Create, Alter & Drop Table

Hive Data Types / Create, Alter & Drop Table In this module, you will be exposed to various data types in Hive, how to create and drop a database in Hive, data modelling in Hive, various DDL commands with examples.

  • 1 Video
  • 3 Hours
  • 40 Problems

Hive Partitions & Buckets

Hive Partitions & Buckets In this module, you will be exposed to Partitioning and Bucketing data model in Hive.

  • 1 Video
  • 1 Hours
  • 23 Problems

Hive Indexes and Views

Hive Indexes and Views In this module, you will be exposed on how to create Hive Index and Hive Views, manage views and Indexing of Hive, Hive index types, Hive index performance, and Hive view performance.

  • 1 Video
  • 1 Hour
  • 20 Problems

Hive Queries: Order By, Group By, Distribute By, Cluster / Join & SubQuery

Hive Queries: Order By, Group By, Distribute By, Cluster / Join & SubQuery In this module, you will be exposed to various clauses like order by, group by, sort by, cluster by and distribute by. And you will be exposed to different types of joins in Hive and steps to write subqueries with examples.

  • 1 Video
  • 5 Hours
  • 38 Problems

HiveQL(Hive Query Language) Tutorial: Built-in Operators

HiveQL: Built-in Operators In this module, you will be exposed to various Hive built-in operators.

  • 1 Video
  • 5 Hours
  • 39 Problems

Hive Function: Built-in & UDF (User Defined Functions)

Hive Function: Built-in & UDF (User Defined Functions) In this module, you will be exposed to various built-in functions in Hive. And you will be exposed to UDF(User Defined Functions).

  • 1 Video
  • 4 Hours
  • 33 Problems

Hive ETL: Loading JSON, XML, Text Data Examples

Hive ETL: Loading JSON, XML, Text Data Examples In this module, you will be exposed to Introduction about Hive as ETL, Working with Structured Data using Hive, Working with Semi structured data using Hive and Hive in Real time projects.

  • 1 Video
  • 1 Hour
  • 13 Problems

What is Pig? Pig Architecture / How to download and install Pig

What is Pig? Pig Architecture / How to download and install Pig In this module, you will be exposed to Apache Pig and its features, various components in Pig Architecture and how to install and configure Pig.

  • 1 Video
  • 1 Hour
  • 20 Problems

Pig Script

Pig Script In this module, you will be exposed to Pig Scripting with examples.

  • 1 Video
  • 6 Hours
  • 27 Problems

Spark with Scala - Hands On with Big Data!

ABOUT THE COURSE

This course helps you to understand the Spark program flow, basic Scala constructs, RDD operations, querying data using Spark SQL and Spark Streaming to initialize, transform, deploy and monitor applications.

COURSE OBJECTIVES

Upon successful completion of the course, the learner will be able to :
  • Learn about Spark and Scala, and the program flow of Spark.
  • Learn about loading and storing data in various formats using RDD operations
  • Learn about querying using Spark SQL
  • Learn about Spark Streaming to initialize, transform, deploy and monitor applications

Course Content

Spark and Scala Fundamentals

In this module, you will be able to understand to the difference between Spark and Hadoop framework, key components of Spark ecosystem, Spark program flow, how to work with basic Scala constructs and building programs in Spark.

  • 1 Video
  • 5 Hours
  • 45 Problems

Spark Programming

In this module, you will be able to understand the creation and performance of RDD operations, how to pass functions to Spark, perform transformations and actions on RDD, how to work with key/value pairs and how to load and save data in various formats.

  • 1 Video
  • 5 Hours
  • 45 Problems

Spark SQL

In this module, you will be able to understand the use of SchemaRDD in Spark programs, how to learn and query data with Apache Hive and JSON support, how to use Spark SQL JDBC server to run Spark SQL, how to use Spark SQL UDFs and Hive UDFs and Fine-Tune Spark SQL Performance.

  • 1 Video
  • 2 Hours
  • 25 Problems

Spark Streaming

In this module, you will be able to understand spark Streaming architecture and the concept of linking, how to initialize StreamingContext, input DStreams and receivers, various transformations on DStreams, how to deploy Spark streaming applications and Monitor streaming applications.

  • 1 Video
  • 2 Hours
  • 25 Problems

Spark MLlib and GraphX

Module description In this module, you will be aware of Graphs and its computational features, GraphX and its use-cases, Machine Learning Tools and its Algorithms.

  • 1 Video
  • 2 Hours
  • 25 Problems

Comprehensive Course on Hadoop Analytic Tool

ABOUT THE COURSE

In this course, we will learn about the additional Hadoop Tools like, Oozie, Zookeeper, Sqoop, Flume, Yarn and Storm. We would also see the automated data processing with Oozie, distributed process coordination with Zookeeper and efficiently transferring bulk data using Scoop and Flume.

COURSE OBJECTIVES

Upon successful completion of the course, the learner will be able to :
  • Learn about the additional Hadoop tools like, Oozie, Zookeeper, Sqoop, Flume, Yarn and Storm.
  • Learn about automating the data Processing with Oozie.
  • Learn about coordinating the distributed processing with Zookeeper
  • Learn about Efficiently transferring Data using Scoop and Flume

Course Content

Automated Data Processing with oozie

In this module, you will be able to understand the fundamentals, workflow, and components of Oozie, Oozie Workflow, Oozie Coordinator, Oozie Bundle, the overall Oozie execution model, how to access Oozie Server and the Oozie support for Service Level Agreements.

  • 1 Video
  • 2 Hours

Using Oozie

In this module, you will be able to design an Oozie application, how to implement Oozie Workflows, Oozie Coordinator applications, an Oozie Bundle and how to deploy, test, and execute Oozie applications.

  • 1 Video
  • 2 Hours

Distributed process coordination with Zookeeper

In this module, you will be able to understand the role and benefits of Apache ZooKeeper, some terms related to ZooKeeper, use of the ZooKeeper command line interface, how to install and run ZooKeeper, popular ZooKeeper applications and how to build applications using ZooKeeper.

  • 1 Video
  • 2 Hours

Efficiently transferring Bulk Data using Sqoop

In this module, you will be able to understand the basics of using Sqoop and Sqoop 2, steps to import data into Hive and HBase, steps to export data from HDFS, use of drivers and connectors in Sqoop, Sqoop architecture and list the challenges of Sqoop and the advantages of Sqoop 2 over Sqoop1.

  • 1 Video
  • 2 Hours

Flume

In this module, you will be able to understand the architecture of Flume, use of Flume configuration file and how to install, configure, and build Flume for data aggregation.

  • 1 Video
  • 2 Hours

About E-Box

E-Box is a Technology Enabled Active Learning and
Assessment platform for technology and engineering
domains apart from the basic LMS components like
quizzes, assignments, lesson components.

Connect with us

E-Box Google Playstore