Big Data for Chimps
A Guide to Massive-Scale Data Processing in Practice
Paperback Engels 2015 1e druk 9781491923948Samenvatting
Finding patterns in massive event streams can be difficult, but learning how to find them doesn’t have to be. This unique hands-on guide shows you how to solve this and many other problems in large-scale data processing with simple, fun, and elegant tools that leverage Apache Hadoop. You’ll gain a practical, actionable view of big data by working with real data and real problems.
Perfect for beginners, this book’s approach will also appeal to experienced practitioners who want to brush up on their skills. Part I explains how Hadoop and MapReduce work, while Part II covers many analytic patterns you can use to process any data. As you work through several exercises, you’ll also learn how to use Apache Pig to process data.
- Learn the necessary mechanics of working with Hadoop, including how data and computation move around the cluster
- Dive into map/reduce mechanics and build your first map/reduce job in Python
- Understand how to run chains of map/reduce jobs in the form of Pig scripts
- Use a real-world dataset—baseball performance statistics—throughout the book
- Work with examples of several analytic patterns, and learn when and where you might use them
Specificaties
Lezersrecensies
Inhoudsopgave
1. Hadoop Basics
-Chimpanzee and Elephant Start a Business
-Map-Only Jobs: Process Records Individually
-Pig Latin Map-Only Job
-Setting Up a Docker Hadoop Cluster
-Wrapping Up
2. MapReduce
-Chimpanzee and Elephant Save Christmas
-Pygmy Elephants Carry Each Toy Form to the Appropriate Workbench
-Example: Reindeer Games
-Hadoop Versus Traditional Databases
-The MapReduce Haiku
-Wrapping Up
3. A Quick Look into Baseball
-The Data
-Acronyms and Terminology
-The Rules and Goals
-Performance Metrics
-Wrapping Up
4. Introduction to Pig
-Pig Helps Hadoop Work with Tables, Not Records
-Fundamental Data Operations
-LOAD Locates and Describes Your Data
-STORE Writes Data to Disk
-Development Aid Commands
-Pig Functions
-Piggybank
-Apache DataFu
-Wrapping Up
Part 2: Tactics: Analytic Patterns
5. Map-Only Operations
-Pattern in Use
-Eliminating Data
-Selecting Records That Satisfy a Condition: FILTER and Friends
-Project Only Chosen Columns by Name
-Transforming Records
-Operations That Break One Table into Many
-Operations That Treat the Union of Several Tables as One
-Wrapping Up
6. Grouping Operations
-Grouping Records into a Bag by Key
-Group and Aggregate
-Calculating the Distribution of Numeric Values with a Histogram
-The Summing Trick
-Wrapping Up
-References
7. Joining Tables
-Matching Records Between Tables (Inner Join)
-How a Join Works
-Enumerating a Many-to-Many Relationship
-Joining a Table with Itself (Self-Join)
-Joining Records Without Discarding Nonmatches (Outer Join)
-Selecting Only Records That Lack a Match in Another Table (Anti-Join)
-Selecting Only Records That Possess a Match in Another Table (Semi-Join)
-Wrapping Up
8. Ordering Operations
-Preparing Career Epochs
-Sorting All Records in Total Order
-Sorting Records Within a Group
-Numbering Records in Rank Order
-Wrapping Up
9. Duplicate and Unique Records
-Handling Duplicates
-Set Operations
-Wrapping Up
Anderen die dit boek kochten, kochten ook
Rubrieken
- advisering
- algemeen management
- coaching en trainen
- communicatie en media
- economie
- financieel management
- inkoop en logistiek
- internet en social media
- it-management / ict
- juridisch
- leiderschap
- marketing
- mens en maatschappij
- non-profit
- ondernemen
- organisatiekunde
- personal finance
- personeelsmanagement
- persoonlijke effectiviteit
- projectmanagement
- psychologie
- reclame en verkoop
- strategisch management
- verandermanagement
- werk en loopbaan