Sqoop, Hive and Impala for Data Analysts (Formerly CCA 159)

Sqoop, Hive and Impala for Data Analysts (Formerly CCA 159) Coupon


As a part of Sqoop, Hive, and Impala for Knowledge Analysts (Previously CCA 159), you’ll be taught key abilities similar to Sqoop, Hive, and Impala.

This complete course covers all points of the certification with real-world examples and knowledge units.

Overview of Huge Knowledge ecosystem

  • Overview Of Distributions and Administration Instruments

  • Properties and Properties Recordsdata – Basic Pointers

  • Hadoop Distributed File System

  • YARN and Map Reduce2

  • Submitting Map ReduceJob

  • Figuring out Variety of Mappers and Reducers

  • Understanding YARN and Map Cut back Configuration Properties

  • Assessment and Override Job Properties

  • Reviewing Map Cut back Job Logs

  • Map Cut back Job Counters

  • Overview of Hive

  • Databases and Question Engines

  • Overview of Knowledge Ingestion in Huge Knowledge

  • Knowledge Processing utilizing Spark

HDFS Instructions to handle recordsdata

  • Introduction to HDFS for Certification Exams

  • Overview of HDFS and PropertiesFiles

  • Overview of Hadoop CLI

  • Itemizing Recordsdata in HDFS

  • Person Areas or House Directories in HDFS

  • Creating Directories in HDFS

  • Copying Recordsdata and Directories into HDFS

  • File and Listing Permissions Overview

  • Getting Recordsdata and Directories from HDFS

  • Previewing Textual content Recordsdata in HDFS

  • Copying or Transferring Recordsdata and Directories inside HDFS

  • Understanding Measurement of File System and Recordsdata

  • Overview of Block Measurement and ReplicationFactor

  • Getting File Metadata utilizing hdfs fsck

  • Assets and Workouts

Getting Began with Hive

  • Overview of Hive Language Guide

  • Launching and utilizing Hive CLI

  • Overview of Hive Properties

  • Hive CLI Historical past and hiverc

  • Working HDFS Instructions in Hive CLI

  • Understanding Warehouse Listing

  • Creating and Utilizing Hive Databases

  • Creating and Describing Hive Tables

  • Retrieve Matadata of Tables utilizing DESCRIBE

  • Function of Hive Metastore Database

  • Overview of beeline

  • Working Hive Instructions and Queries utilizing beeline

Creating Tables in Hive utilizing Hive QL

  • Creating Tables in Hive – orders

  • Overview of Fundamental Knowledge Sorts in Hive

  • Including Feedback to Columns and Tables

  • Loading Knowledge into Hive Tables from Native File System

  • Loading Knowledge into Hive Tables from HDFS

  • Loading Knowledge – Overwrite vs Append

  • Creating Exterior tables in Hive

  • Specifying Location for Hive Tables

  • Distinction between Managed Desk and Exterior Desk

  • Default Delimiters in Hive Tables utilizing Textual content File

  • Overview of File Codecs in Hive

  • Variations between Hive and RDBMS

  • Truncate and Drop tables in Hive

  • Assets and Workouts

Loading/Inserting knowledge into Hive tables utilizing Hive QL

  • Introduction to Partitioning and Bucketing

  • Creating Tables utilizing Orc Format – order_items

  • Inserting Knowledge into Tables utilizing Stage Tables

  • Load vs. Insert in Hive

  • Creating Partitioned Tables in Hive

  • Including Partitions to Tables in Hive

  • Loading into Partitions in Hive Tables

  • Inserting Knowledge Into Partitions in Hive Tables

  • Insert Utilizing Dynamic Partition Mode

  • Creating Bucketed Tables in Hive

  • Inserting Knowledge into Bucketed Tables

  • Bucketing with Sorting

  • Overview of ACID Transactions

  • Create Tables for Transactions

  • Inserting Particular person Data into Hive Tables

  • Replace and Delete Knowledge in Hive Tables

Overview of features in Hive

  • Overview of Capabilities

  • Validating Capabilities

  • String Manipulation – Case Conversion and Size

  • String Manipulation – substr and cut up

  • String Manipulation – Trimming and Padding Capabilities

  • String Manipulation – Reverse and Concatenating A number of Strings

  • Date Manipulation – Present Date and Timestamp

  • Date Manipulation – Date Arithmetic

  • Date Manipulation – trunc

  • Date Manipulation – Utilizing date format

  • Date Manipulation – Extract Capabilities

  • Date Manipulation – Coping with Unix Timestamp

  • Overview of Numeric Capabilities

  • Knowledge Sort Conversion Utilizing Solid

  • Dealing with Null Values

  • Question Instance – Get Phrase Depend

Writing Fundamental Queries in Hive

  • Overview of SQL or Hive QL

  • Execution Life Cycle of Hive Question

  • Reviewing Logs of Hive Queries

  • Projecting Knowledge utilizing Choose and Overview of From

  • Derive Conditional Values utilizing CASE and WHEN

  • Projecting Distinct Values

  • Filtering Knowledge utilizing The place Clause

  • Boolean Operations in The place Clause

  • Boolean OR vs IN Operator

  • Filtering Knowledge utilizing LIKE Operator

  • Performing Fundamental Aggregations utilizing Mixture Capabilities

  • Performing Aggregations utilizing GROUP BY

  • Filtering Aggregated Knowledge Utilizing HAVING

  • International Sorting utilizing ORDER BY

  • Overview of DISTRIBUTE BY

  • Sorting Knowledge inside Teams utilizing SORT BY

  • Utilizing CLUSTERED BY

Becoming a member of Knowledge Units and Set Operations in Hive

  • Overview of Nested Sub Queries

  • Nested Sub Queries – Utilizing IN Operator

  • Nested Sub Queries – Utilizing EXISTS Operator

  • Overview of Joins in Hive

  • Performing Interior Joins utilizing Hive

  • Performing Outer Joins utilizing Hive

  • Performing Full Outer Joins utilizing Hive

  • Map Aspect Be a part of and Cut back Aspect Take part Hive

  • Becoming a member of in Hive utilizing Legacy Syntax

  • Cross Joins in Hive

  • Overview of Set Operations in Hive

  • Carry out Set Union between two Hive Question Outcomes

  • Set Operations – Intersect and Minus Not Supported

Windowing or Analytics Capabilities in Hive

  • Put together HR Database in Hive with Workers Desk

  • Overview of Analytics or Windowing Capabilities in Hive

  • Performing Aggregations utilizing Hive Queries

  • Create Tables to Get Every day Income utilizing CTAS in Hive

  • Getting Lead and Lag utilizing Windowing Capabilities in Hive

  • Getting First and Final Values utilizing Windowing Capabilities in Hive

  • Making use of Rank utilizing Windowing Capabilities in Hive

  • Making use of Dense Rank utilizing Windowing Capabilities in Hive

  • Making use of Row Quantity utilizing Windowing Capabilities in Hive

  • Distinction Between rank, dense_rank, and row_number in Hive

  • Understanding the order of execution of Hive Queries

  • Overview of Nested Sub Queries in Hive

  • Filtering Knowledge on Prime of Window Capabilities in Hive

  • Getting Prime 5 Merchandise by Income for Every Day utilizing Windowing Capabilities in Hive – Recap

Working Queries utilizing Impala

  • Introduction to Impala

  • Function of Impala Daemons

  • Impala State Retailer and Catalog Server

  • Overview of Impala Shell

  • Relationship between Hive and Impala

  • Overview of Creating Databases and Tables utilizing Impala

  • Loading and Inserting Knowledge into Tables utilizing Impala

  • Working Queries utilizing Impala Shell

  • Reviewing Logs of Impala Queries

  • Synching Hive and Impala – Utilizing Invalidate Metadata

  • Working Scripts utilizing Impala Shell

  • Project – Utilizing NYSE Knowledge

  • Project – Answer

Getting Began with Sqoop

  • Introduction to Sqoop

  • Validate Supply Database – MySQL

  • Assessment JDBC Jar to Connect with MySQL

  • Getting Assist utilizing Sqoop CLI

  • Overview of Sqoop Person Information

  • Validate Sqoop and MySQL Integration utilizing Sqoop Record Databases

  • Itemizing Tables in Database utilizing Sqoop

  • Run Queries in MySQL utilizing Sqoop Eval

  • Understanding Logs in Sqoop

  • Redirecting Sqoop Job Logs into Log Recordsdata

Importing knowledge from MySQL to HDFS utilizing Sqoop Import

  • Overview of Sqoop Import Command

  • Import Orders utilizing target-dir

  • Import Order Gadgets utilizing warehouse-dir

  • Managing HDFS Directories

  • Sqoop Import Execution Circulate

  • Reviewing Logs of Sqoop Import

  • Sqoop Import Specifying Variety of Mappers

  • Assessment the Output Recordsdata generated by Sqoop Import

  • Sqoop Import Supported File Codecs

  • Validating avro recordsdata utilizing Avro Instruments

  • Sqoop Import Utilizing Compression

Apache Sqoop – Importing Knowledge into HDFS – Customizing

  • Introduction to customizing Sqoop Import

  • Sqoop Import by Specifying Columns

  • Sqoop import Utilizing Boundary Question

  • Sqoop import whereas filtering Pointless Knowledge

  • Sqoop Import Utilizing Break up By to distribute import utilizing non default column

  • Getting Question Outcomes utilizing Sqoop eval

  • Coping with tables with Composite Keys whereas utilizing Sqoop Import

  • Coping with tables with Non Numeric Key Fields whereas utilizing Sqoop Import

  • Coping with tables with No Key Fields whereas utilizing Sqoop Import

  • Utilizing autoreset-to-one-mapper to make use of just one mapper whereas importing knowledge utilizing Sqoop from tables with no key fields

  • Default Delimiters utilized by Sqoop Import for Textual content File Format

  • Specifying Delimiters for Sqoop Import utilizing Textual content File Format

  • Coping with Null Values utilizing Sqoop Import

  • Import Mulitple Tables from supply database utilizing Sqoop Import

Importing knowledge from MySQL to Hive Tables utilizing Sqoop Import

  • Fast Overview of Hive

  • Create Hive Database for Sqoop Import

  • Create Empty Hive Desk for Sqoop Import

  • Import Knowledge into Hive Desk from supply database desk utilizing Sqoop Import

  • Managing Hive Tables whereas importing knowledge utilizing Sqoop Import utilizing Overwrite

  • Managing Hive Tables whereas importing knowledge utilizing Sqoop Import – Errors Out If Desk Already Exists

  • Understanding Execution Circulate of Sqoop Import into Hive tables

  • Assessment Recordsdata generated by Sqoop Import in Hive Tables

  • Sqoop Delimiters vs Hive Delimiters

  • Completely different File Codecs supported by Sqoop Import whereas importing into Hive Tables

  • Sqoop Import all Tables into Hive from supply database

Exporting Knowledge from HDFS/Hive to MySQL utilizing Sqoop Export

  • Introduction to Sqoop Export

  • Put together Knowledge for Sqoop Export

  • Create Desk in MySQL for Sqoop Export

  • Carry out Easy Sqoop Export from HDFS to MySQL desk

  • Understanding Execution Circulate of Sqoop Export

  • Specifying Variety of Mappers for Sqoop Export

  • Troubleshooting the Points associated to Sqoop Export

  • Merging or Upserting Knowledge utilizing Sqoop Export – Overview

  • Fast Overview of MySQL – Upsert utilizing Sqoop Export

  • Replace Knowledge utilizing Replace Key utilizing Sqoop Export

  • Merging Knowledge utilizing allowInsert in Sqoop Export

  • Specifying Columns utilizing Sqoop Export

  • Specifying Delimiters utilizing Sqoop Export

  • Utilizing Stage Desk for Sqoop Export

Submitting Sqoop Jobs and Incremental Sqoop Imports

  • Introduction to Sqoop Jobs

  • Including Password File for Sqoop Jobs

  • Creating Sqoop Job

  • Run Sqoop Job

  • Overview of Incremental Masses utilizing Sqoop

  • Incremental Sqoop Import – Utilizing The place

  • Incremental Sqoop Import – Utilizing Append Mode

  • Incremental Sqoop Import – Create Desk

  • Incremental Sqoop Import – Create Sqoop Job

  • Incremental Sqoop Import – Execute Job

  • Incremental Sqoop Import – Add Further Knowledge

  • Incremental Sqoop Import – Rerun Job

  • Incremental Sqoop Import – Utilizing Final Modified

Listed below are the targets for this course.

Present Construction to the Knowledge

Use Knowledge Definition Language (DDL) statements to create or alter constructions within the metastore to be used by Hive and Impala.

  • Create tables utilizing quite a lot of knowledge varieties, delimiters, and file codecs

  • Create new tables utilizing present tables to outline the schema

  • Enhance question efficiency by creating partitioned tables within the metastore

  • Alter tables to change the prevailing schema

  • Create views in an effort to simplify queries

Knowledge Evaluation

Use Question Language (QL) statements in Hive and Impala to research knowledge on the cluster.

  • Put together stories utilizing SELECT instructions together with unions and subqueries

  • Calculate mixture statistics, similar to sums and averages, throughout a question

  • Create queries towards a number of knowledge sources through the use of be a part of instructions

  • Rework the output format of queries through the use of built-in features

  • Carry out queries throughout a bunch of rows utilizing windowing features

Workouts can be offered to have sufficient follow to get higher at Sqoop in addition to writing queries utilizing Hive and Impala.

All of the demos are given on our state-of-the-art Huge Knowledge cluster. If you happen to don’t have multi-node cluster, you may join our labs and follow on our multi-node cluster. It is possible for you to to follow Sqoop and Hive on the cluster.

Join us on telegram for Course Updates
Article Categories:
Udemy Free Courses

Leave a Reply

Your email address will not be published. Required fields are marked *