

Chuck Lam is a Senior Engineer at RockYou! He has a PhD in pattern recognition from Stanford University.
Meer over Chuck LamHadoop in Action
Paperback Engels 2010 1e druk 9781935182191Samenvatting
Big data can be difficult to handle using traditional databases. Apache Hadoop is a NoSQL applications framework that runs on distributed clusters. This lets it scale to huge datasets. If you need analytic information from your data, Hadoop's the way to go.
'Hadoop in Action' introduces the subject and teaches you how to write programs in the MapReduce style. It starts with a few easy examples and then moves quickly to show Hadoop use in more complex data analysis tasks. Included are best practices and design patterns of MapReduce programming.
This book requires basic Java skills. Knowing basic statistical concepts can help with the more advanced examples.
What's inside:
- Introduction to MapReduce
- Examples illustrating ideas in practice
- Hadoop's Streaming API
- Other related tools, like Pig and Hive
Specificaties
Lezersrecensies
Inhoudsopgave
U kunt van deze inhoudsopgave een PDF downloaden
Acknowledgments
About this book
Author Online
About the author
About the cover illustration
Part 1: Hadoop-A Distributed Programming Framework
1. Introducing Hadoop
1.1 Why "Hadoop in Action"?
1.2 What is Hadoop?
1.3 Understanding distributed systems and Hadoop
1.4 Comparing SQL databases and Hadoop
1.5 Understanding MapReduce
1.6 Counting words with Hadoop-running your first program
1.7 History of Hadoop
1.8 Summary
1.9 Resources
2. Starting Hadoop
2.1 The building blocks of Hadoop
2.2 Setting up SSH for a Hadoop cluster
2.3 Running Hadoop
2.4 Web-based cluster UI
2.5 Summary
3. Components of Hadoop
3.1 Working with files in HDFS
3.2 Anatomy of a MapReduce program
3.3 Reading and writing
3.4 Summary
Part 2: Hadoop In Action
4. Writing basic MapReduce programs
4.1 Getting the patent data set
4.2 Constructing the basic template of a MapReduce program
4.3 Counting things
4.4 Adapting for Hadoop's API changes
4.5 Streaming in Hadoop
4.6 Improving performance with combiners
4.7 Exercising what you've learned
4.8 Summary
4.9 Further resources
5. Advanced MapReduce
5.1 Chaining MapReduce jobs
5.2 Joining data from different sources
5.3 Creating a Bloom filter
5.4 Exercising what you've learned
5.5 Summary
5.6 Further resources
6. Programming Practices
6.1 Developing MapReduce programs
6.2 Monitoring and debugging on a production cluster
6.3 Tuning for performance
6.4 Summary
7. Cookbook
7.1 Passing job-specific parameters to your tasks
7.2 Probing for task-specific information
7.3 Partitioning into multiple output files
7.4 Inputting from and outputting to a database
7.5 Keeping all output in sorted order
7.6 Summary
8. Managing Hadoop
8.1 Setting up parameter values for practical use
8.2 Checking system's health
8.3 Setting permissions
8.4 Managing quotas
8.5 Enabling trash
8.6 Removing DataNodes
8.7 Adding DataNodes
8.8 Managing NameNode and Secondary NameNode
8.9 Recovering from a failed NameNode
8.10 Designing network layout and rack awareness
8.11 Scheduling jobs from multiple users
8.12 Summary
Part 3: Hadoop Gone Wild
9. Running Hadoop in the cloud
9.1 Introducing Amazon Web Services
9.2 Setting up AWS
9.3 Setting up Hadoop on EC2
9.4 Running MapReduce programs on EC2
9.5 Cleaning up and shutting down your EC2 instances
9.6 Amazon Elastic MapReduce and other AWS services
9.7 Summary
10. Programming with Pig
10.1 Thinking like a Pig
10.2 Installing Pig
10.3 Running Pig
10.4 Learning Pig Latin through Grunt
10.5 Speaking Pig Latin
10.6 Working with user-defined functions
10.7 Working with scripts
10.8 Seeing Pig in action-example of computing similar patents
10.9 Summary
11. Hive and the Hadoop herd
11.1 Hive
11.2 Other Hadoop-related stuff
11.3 Summary
12. Case studies
12.1 Converting 11 million image documents from the New York Times archive
12.2 Mining data at China Mobile
12.3 Recommending the best websites at StumbleUpon
12.4 Building analytics for enterprise search-IBM's Project ES2
Appendix: HDFS file commands
Index
Anderen die dit boek kochten, kochten ook
Rubrieken
- advisering
- algemeen management
- coaching en trainen
- communicatie en media
- economie
- financieel management
- inkoop en logistiek
- internet en social media
- it-management / ict
- juridisch
- leiderschap
- marketing
- mens en maatschappij
- non-profit
- ondernemen
- organisatiekunde
- personal finance
- personeelsmanagement
- persoonlijke effectiviteit
- projectmanagement
- psychologie
- reclame en verkoop
- strategisch management
- verandermanagement
- werk en loopbaan