

Alan is an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project.
Meer over de auteursProgramming Pig
Dataflow Scripting with Hadoop
Paperback Engels 2016 2e druk 9781491937099Samenvatting
For many organizations, Hadoop is the first step for dealing with massive amounts of data. The next step? Processing and analyzing datasets with the Apache Pig scripting platform. With Pig, you can batch-process data without having to create a full-fledged application, making it easy to experiment with new datasets.
Updated with use cases and programming examples, this second edition is the ideal learning tool for new and experienced users alike. You’ll find comprehensive coverage on key features such as the Pig Latin scripting language and the Grunt shell. When you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig.
- Delve into Pig’s data model, including scalar and complex data types
- Write Pig Latin scripts to sort, group, join, project, and filter your data
- Use Grunt to work with the Hadoop Distributed File System (HDFS)
- Build complex data processing pipelines with Pig’s macros and modularity features
- Embed Pig Latin in Python for iterative processing and other advanced tasks
- Use Pig with Apache Tez to build high-performance batch and interactive data processing applications
- Create your own load and store functions to handle data formats and storage mechanisms
Specificaties
Lezersrecensies
Inhoudsopgave
1. What Is Pig?
Pig Latin, a Parallel Data Flow Language
Pig on Hadoop
What Is Pig Useful For?
The Pig Philosophy
Pig’s History
2. Installing and Running Pig
Downloading and Installing Pig
Running Pig
Grunt
3. Pig’s Data Model
Types
Schemas
4. Introduction to Pig Latin
Preliminary Matters
Input and Output
Relational Operations
User-Defined Functions
5. Advanced Pig Latin
Advanced Relational Operations
Integrating Pig with Executables and Native Jobs
split and Nonlinear Data Flows
Controlling Execution
Pig Latin Preprocessor
6. Developing and Testing Pig Latin Scripts
Development Tools
Testing Your Scripts with PigUnit
7. Making Pig Fly
Writing Your Scripts to Perform Well
Writing Your UDFs to Perform
Tuning Pig and Hadoop for Your Job
Using Compression in Intermediate Results
Data Layout Optimization
Map-Side Aggregation
The JAR Cache
Processing Small Jobs Locally
Bloom Filters
Schema Tuple Optimization
Dealing with Failures
8. Embedding Pig
Embedding Pig Latin in Scripting Languages
Using the Pig Java APIs
9. Writing Evaluation and Filter Functions
Writing an Evaluation Function in Java
The Algebraic Interface
The Accumulator Interface
Writing Filter Functions
Writing Evaluation Functions in Scripting Languages
10. Writing Load and Store Functions
Load Functions
Store Functions
Shipping JARs Automatically
Handling Bad Records
11. Pig on Tez
What Is Tez?
Running Pig on Tez
Potential Differences When Running on Tez
Pig on Tez Internals
12. Pig and Other Members of the Hadoop Community
Pig and Hive
Cascading
Spark
NoSQL Databases
DataFu
Oozie
13. Use Cases and Programming Examples
Sparse Tuples
k-Means
intersect and except
Pig at Yahoo!
Pig at Particle News
Appendix A: Built-in User Defined Functions and PiggyBank
Index
Rubrieken
- advisering
- algemeen management
- coaching en trainen
- communicatie en media
- economie
- financieel management
- inkoop en logistiek
- internet en social media
- it-management / ict
- juridisch
- leiderschap
- marketing
- mens en maatschappij
- non-profit
- ondernemen
- organisatiekunde
- personal finance
- personeelsmanagement
- persoonlijke effectiviteit
- projectmanagement
- psychologie
- reclame en verkoop
- strategisch management
- verandermanagement
- werk en loopbaan