Skip to main content

Learn By Example: Hadoop, MapReduce for Big Data problems

Learn By Example: Hadoop, MapReduce for Big Data problems
| Pub Date: 5th February 2018 | | Pages/Duration: 13h 44m | Language: English | Format: MP4 AVC 1280×720 AAC 48KHz 2ch | Size: 17.6 GB


Download from Turbobit
Download from DepositFiles
Download from Rapidgator
A hands-on workout in Hadoop, MapReduce and the art of thinking “parallel” This course is a zoom-in, zoom-out, hands-on workout involving Hadoop, MapReduce and the art of thinking parallel. This course is both broad and deep. It covers the individual components of Hadoop in great detail and also gives you a higher level picture of how they interact with each other. It’s a hands-on workout involving Hadoop, MapReduce. This course will get you hands-on with Hadoop very early on. You’ll learn how to set up your own cluster using both VMs and the Cloud. All the major features of MapReduce are covered, including advanced topics like Total Sort and Secondary Sort. MapReduce completely changed the way people thought about processing Big Data. Breaking down any problem into parallelizable units is an art. The examples in this course will train you to think in parallel. Hands-on workout involving Hadoop, MapReduce. What You Will Learn Develop advanced MapReduce applications to process BigData Master the art of thinking parallel and how to break up a task into Map/Reduce transformations Self-sufficiently set up your own mini-Hadoop cluster whether it’s a single node, a physical cluster or in the cloud. Use Hadoop + MapReduce to solve a wide variety of problems : from NLP to Inverted Indices to Recommendations Understand HDFS, MapReduce and YARN and how they interact with each other Understand the basics of performance tuning and managing your own cluster + Table of Contents Introduction 1 You, this course and Us Input and Output Formats and Customized Partitioning# 2 Introducing the File Input Format 3 Text And Sequence File Formats 4 Data partitioning using a custom partitioner 5 Make the custom partitioner real in code 6 Total Order Partitioning 7 Input Sampling, Distribution, Partitioning and configuring these 8 Secondary Sort Recommendation Systems using Collaborative Filtering 9 Introduction to Collaborative Filtering 10 Friend recommendations using chained MR jobs 11 Get common friends for every pair of users – the first MapReduce 12 Top 10 friend recommendation for every user – the second MapReduce Hadoop as a Database# 13 Structured data in Hadoop 14 Running an SQL Select with MapReduce 15 Running an SQL Group By with MapReduce 16 A MapReduce Join – The Map Side 17 A MapReduce Join – The Reduce Side 18 A MapReduce Join – Sorting and Partitioning 19 A MapReduce Join – Putting it all together K-Means Clustering 20 What is K-Means Clustering 21 A MapReduce job for K-Means Clustering 22 K-Means Clustering – Measuring the distance between points 23 K-Means Clustering – Custom Writables for Input_Output 24 K-Means Clustering – Configuring the Job 25 K-Means Clustering – The Mapper and Reducer 26 K-Means Clustering – The Iterative MapReduce Job Setting up a Hadoop Cluster# 27 Manually configuring a Hadoop cluster (Linux VMs) 28 Getting started with Amazon Web Servicies 29 Start a Hadoop Cluster with Cloudera Manager on AWS Appendix 30 Setup a Virtual Linux Instance (For Windows users) 31 [For Linux_Mac OS Shell Newbies] Path and other Environment Variables Why is Big Data a Big Deal 32 The Big Data Paradigm 33 Serial vs Distributed Computing 34 What is Hadoop 35 HDFS or the Hadoop Distributed File System 36 MapReduce Introduced 37 YARN or Yet Another Resource Negotiator Installing Hadoop in a Local Environment 38 Hadoop Install Modes 39 Hadoop Standalone mode Install 40 Hadoop Pseudo-Distributed mode Install The MapReduce ‘Hello World’# 41 The basic philosophy underlying MapReduce 42 MapReduce – Visualized And Explained 43 MapReduce – Digging a little deeper at every step 44 Hello World in MapReduce 45 The Mapper 46 The Reducer 47 The Job Run a MapReduce Job 48 Get comfortable with HDFS 49 Run your first MapReduce Job Juicing your MapReduce – Combiners, Shuffle and Sort and The Streaming API# 50 Parallelize the reduce phase – use the Combiner 51 Not all Reducers are Combiners 52 How many mappers and reducers does your MapReduce have 53 Parallelizing reduce using Shuffle And Sort 54 MapReduce is not limited to the Java language – Introducing the Streaming API 55 Python for MapReduce HDFS and Yarn# 56 HDFS – Protecting against data loss using replication 57 HDFS – Name nodes and why they’re critical 58 HDFS – Checkpointing to backup name node information 59 Yarn – Basic components 60 Yarn – Submitting a job to Yarn 61 Yarn – Plug in scheduling policies 62 Yarn – Configure the scheduler MapReduce Customizations For Finer Grained Control# 63 Setting up your MapReduce to accept command line arguments 64 The Tool, ToolRunner and GenericOptionsParser 65 Configuring properties of the Job object 66 Customizing the Partitioner, Sort Comparator, and Group Comparator The Inverted Index, Custom Data Types for Keys, Bigram Counts and Unit Tests!# 67 The heart of search engines – The Inverted Index 68 Generating the inverted index using MapReduce 69 Custom data types for keys – The Writable Interface 70 Represent a Bigram using a WritableComparable 71 MapReduce to count the Bigrams in input text 72 Test your MapReduce job using MRUnit