Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20
, ,

Kafka - The Definitive Guide

Real-Time Data and Stream Processing at Scale

Paperback Engels 2017 9781491936160
Verwachte levertijd ongeveer 8 werkdagen

Samenvatting

Every enterprise application creates data, whether it's log messages, metrics, user activity, outgoing messages, or something else. And how to move all of this data becomes nearly as important as the data itself. If you're an application architect, developer, or production engineer new to Apache Kafka, this practical guide shows you how to use this open source streaming platform to handle real-time data feeds.

Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. Through detailed examples, you'll learn Kafka's design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer.

- Understand publish-subscribe messaging and how it fits in the big data ecosystem.
- Explore Kafka producers and consumers for writing and reading messages
- Understand Kafka patterns and use-case requirements to ensure reliable data delivery
- Get best practices for building data pipelines and applications with Kafka
- Manage Kafka in production, and learn to perform monitoring, tuning, and maintenance tasks
- Learn the most critical metrics among Kafka's operational measurements
- Explore how Kafka's stream delivery capabilities make it a perfect source for stream processing systems

Specificaties

ISBN13:9781491936160
Taal:Engels
Bindwijze:paperback
Aantal pagina's:322
Uitgever:O'Reilly
Druk:1
Verschijningsdatum:29-9-2017
Hoofdrubriek:IT-management / ICT
ISSN:

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Geef uw waardering

Zeer goed Goed Voldoende Matig Slecht

Inhoudsopgave

Foreword
Preface
Who Should Read This Book
Conventions Used in This Book
Using Code Examples
O’Reilly Safari
How to Contact Us
Acknowledgments

1. Meet Kafka
Publish/Subscribe Messaging
How It Starts
Individual Queue Systems
Enter Kafka
Messages and Batches
Schemas
Topics and Partitions
Producers and Consumers
Brokers and Clusters
Multiple Clusters
Why Kafka?
Multiple Producers
Multiple Consumers
Disk-Based Retention
Scalable
High Performance
The Data Ecosystem
Use Cases
Kafka’s Origin
LinkedIn’s Problem
The Birth of Kafka
Open Source
The Name
Getting Started with Kafka

2. Installing Kafka
First Things First
Choosing an Operating System
Installing Java
Installing Zookeeper
Installing a Kafka Broker
Broker Configuration
General Broker
Topic Defaults
Hardware Selection
Disk Throughput
Disk Capacity
Memory
Networking
CPU
Kafka in the Cloud
Kafka Clusters
How Many Brokers?
Broker Configuration
OS Tuning
Production Concerns
Garbage Collector Options
Datacenter Layout
Colocating Applications on Zookeeper
Summary

3. Kafka Producers: Writing Messages to Kafka
Producer Overview
Constructing a Kafka Producer
Sending a Message to Kafka
Sending a Message Synchronously
Sending a Message Asynchronously
Configuring Producers
Serializers
Custom Serializers
Serializing Using Apache Avro
Using Avro Records with Kafka
Partitions
Old Producer APIs
Summary

4. Kafka Consumers: Reading Data from Kafka
Kafka Consumer Concepts
Consumers and Consumer Groups
Consumer Groups and Partition Rebalance
Creating a Kafka Consumer
Subscribing to Topics
The Poll Loop
Configuring Consumers
Commits and Offsets
Automatic Commit
Commit Current Offset
Asynchronous Commit
Combining Synchronous and Asynchronous Commits
Commit Specified Offset
Rebalance Listeners
Consuming Records with Specific Offsets
But How Do We Exit?
Deserializers
Standalone Consumer: Why and How to Use a Consumer Without a Group
Older Consumer APIs
Summary

5. Kafka Internals
Cluster Membership
The Controller
Replication
Request Processing
Produce Requests
Fetch Requests
Other Requests
Physical Storage
Partition Allocation
File Management
File Format
Indexes
Compaction
How Compaction Works
Deleted Events
When Are Topics Compacted?
Summary

6. Reliable Data Delivery
Reliability Guarantees
Replication
Broker Configuration
Replication Factor
Unclean Leader Election
Minimum In-Sync Replicas
Using Producers in a Reliable System
Send Acknowledgments
Configuring Producer Retries
Additional Error Handling
Using Consumers in a Reliable System
Important Consumer Configuration Properties for Reliable Processing
Explicitly Committing Offsets in Consumers
Validating System Reliability
Validating Configuration
Validating Applications
Monitoring Reliability in Production
Summary

7. Building Data Pipelines
Considerations When Building Data Pipelines
Timeliness
Reliability
High and Varying Throughput
Data Formats
Transformations
Security
Failure Handling
Coupling and Agility
When to Use Kafka Connect Versus Producer and Consumer
Kafka Connect
Running Connect
Connector Example: File Source and File Sink
Connector Example: MySQL to Elasticsearch
A Deeper Look at Connect
Alternatives to Kafka Connect
Ingest Frameworks for Other Datastores
GUI-Based ETL Tools
Stream-Processing Frameworks
Summary

8. Cross-Cluster Data Mirroring
Use Cases of Cross-Cluster Mirroring
Multicluster Architectures
Some Realities of Cross-Datacenter Communication
Hub-and-Spokes Architecture
Active-Active Architecture
Active-Standby Architecture
Stretch Clusters
Apache Kafka’s MirrorMaker
How to Configure
Deploying MirrorMaker in Production
Tuning MirrorMaker
Other Cross-Cluster Mirroring Solutions
Uber uReplicator
Confluent’s Replicator
Summary

9. Administering Kafka
Topic Operations
Creating a New Topic
Adding Partitions
Deleting a Topic
Listing All Topics in a Cluster
Describing Topic Details
Consumer Groups
List and Describe Groups
Delete Group
Offset Management
Dynamic Configuration Changes
Overriding Topic Configuration Defaults
Overriding Client Configuration Defaults
Describing Configuration Overrides
Removing Configuration Overrides
Partition Management
Preferred Replica Election
Changing a Partition’s Replicas
Changing Replication Factor
Dumping Log Segments
Replica Verification
Consuming and Producing
Console Consumer
Console Producer
Client ACLs
Unsafe Operations
Moving the Cluster Controller
Killing a Partition Move
Removing Topics to Be Deleted
Deleting Topics Manually
Summary

10. Monitoring Kafka
Metric Basics
Where Are the Metrics?
Internal or External Measurements
Application Health Checks
Metric Coverage
Kafka Broker Metrics
Under-Replicated Partitions
Broker Metrics
Topic and Partition Metrics
JVM Monitoring
OS Monitoring
Logging
Client Monitoring
Producer Metrics
Consumer Metrics
Quotas
Lag Monitoring
End-to-End Monitoring
Summary

11. Stream Processing
What Is Stream Processing?
Stream-Processing Concepts
Time
State
Stream-Table Duality
Time Windows
Stream-Processing Design Patterns
Single-Event Processing
Processing with Local State
Multiphase Processing/Repartitioning
Processing with External Lookup: Stream-Table Join
Streaming Join
Out-of-Sequence Events
Reprocessing
Kafka Streams by Example
Word Count
Stock Market Statistics
Click Stream Enrichment
Kafka Streams: Architecture Overview
Building a Topology
Scaling the Topology
Surviving Failures
Stream Processing Use Cases
How to Choose a Stream-Processing Framework
Summary

A. Installing Kafka on Other Operating Systems
Installing on Windows
Using Windows Subsystem for Linux
Using Native Java
Installing on MacOS
Using Homebrew
Installing Manually

Index

Managementboek Top 100

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Kafka - The Definitive Guide