Getting Started with Kudu
Perform Fast Analytics on Fast Data
Paperback Engels 2018 1e druk 9781491980255Samenvatting
Get up to speed with Apache Kudu, the column-oriented data store for Hadoop that not only provides an architectural simplification of several existing use cases, but also allows use cases not possible before.
With this practical guide, enterprise architects working on big data implemetations will learn how Kudu's architecture and features solve a unique problem in the Hadoop ecosystem. For example, Kudu makes Hadoop viable for real-time IoT use cases in addition to making a transition from a massively parallel processing (MPP) SQL database engine plausible.
If you're familiar with other storage layer projects such HDFS, HBase, Spanner, and Cassandra, you'll quickly learn-and appreciate-the unique contribution Kudu makes to this ecosystem.
- Explore how Kudu is compatible with data processing frameworks in the Hadoop environment
- Understand Kudu's architecture, internals, installation, and deployment
- Learn how to fully administer a Kudu cluster
- Become acquainted with low-level client APIs, how to integrate with SQL engines like Impala, and frameworks for integration
- Learn about table and schema design
Get use cases, examples, best practices, and sample code
Specificaties
Lezersrecensies
Inhoudsopgave
Conventions Used in This Book
Using Code Examples
O’Reilly Safari
How to Contact Us
Acknowledgments
1. Why Kudu?
Why Does Kudu Matter?
Simplicity Drives Adoption
New Use Cases
IoT
Current Approaches to Real-Time Analytics
Real-Time Processing
Hardware Landscape
Kudu’s Unique Place in the Big Data Ecosystem
Comparing Kudu with Other Ecosystem Components
Big Data—HDFS, HBase, Cassandra
Conclusion
2. About Kudu
Kudu High-Level Design
Kudu Roles
Master Server
Tablet Server
Kudu Concepts and Mechanisms
Hotspotting
Partitioning
3. Getting Up and Running
Installation
Apache Kudu Quickstart VM
Using Cloudera Manager
Building from Source
Packages
Cloudera Quickstart VM
Quick Install: Three Minutes or Less
Conclusion
4. Kudu Administration
Planning for Kudu
Master and Tablet Servers
Write-Ahead Log
Data Servers and Storage
Replication Strategies
Deployment Considerations: New or Existing Clusters?
New Kudu-Only Cluster
New Hadoop Cluster with Kudu
Add Kudu to Existing Hadoop Cluster
Web UI of Tablet and Master Servers
Master Server UI and Tablet Server UI
Master Server UI
Tablet Server UI
The Kudu Command-Line Interface
Cluster
Filesystem
Tablet Replica
Consensus Metadata
Adding and Removing Tablet Servers
Adding Tablet Servers
Removing a Tablet Server
Security
A Simple Analogy
Kudu Security Features
Basic Performance Tuning
Kudu Memory Limits
Maintenance Manager Threads
Monitoring Performance
Getting Ahead and Staying Out of Trouble
Avoid Running Out of Disk Space
Disk Failures Tolerance
Backup
Conclusion
5. Common Developer Tasks for Kudu
Client API
Kudu Client
Kudu Table
Kudu DDL
Kudu Scanner Read Modes
C++ API
Python API
Preparing the Python Development Environment
Python Kudu Application
Java
Java Application
Spark
Impala with Kudu
6. Table and Schema Design
Schema Design Basics
Schema for Hybrid Transactional/Analytical Processing
Lambda Architecture
OLTP/OLAP Split
Primary Key and Column Design
Other Column Schema Considerations
Partitioning Basics
Range Partitioning
Hash Partitioning
Schema Alteration
Best Practices and Tips
Partitioning
Large Objects
decimal
Unique Strings
Compression
Object Names
Number of Columns
Binary Types
Network Packet Example
Conclusion
7. Kudu Use Cases
Real-Time Internet of Things Analytics
Predictive Modeling
Mixed Platforms Solution
Index
Rubrieken
- advisering
- algemeen management
- coaching en trainen
- communicatie en media
- economie
- financieel management
- inkoop en logistiek
- internet en social media
- it-management / ict
- juridisch
- leiderschap
- marketing
- mens en maatschappij
- non-profit
- ondernemen
- organisatiekunde
- personal finance
- personeelsmanagement
- persoonlijke effectiviteit
- projectmanagement
- psychologie
- reclame en verkoop
- strategisch management
- verandermanagement
- werk en loopbaan