Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20
, ,

Streaming Systems

The What, Where, When, and How of Large-Scale Data Processing

Paperback Engels 2018 1e druk 9781491983874
Verwachte levertijd ongeveer 16 werkdagen

Samenvatting

Streaming data is a big deal in big data these days. As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way.

Expanded from Tyler Akidau’s popular blog posts "Streaming 101" and "Streaming 102", this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You’ll also dive deep into watermarks and exactly-once processing with co-authors Slava Chernyak and Reuven Lax.

You’ll explore:
- How streaming and batch data processing patterns compare
- The core principles and concepts behind robust out-of-order data processing
- How watermarks track progress and completeness in infinite datasets
- How exactly-once data processing techniques ensure correctness
- How the concepts of streams and tables form the foundations of both batch and streaming data processing
- The practical motivations behind a powerful persistent state mechanism, driven by a real-world example
- How time-varying relations provide a link between stream processing and the world of SQL and relational algebra

Specificaties

ISBN13:9781491983874
Taal:Engels
Bindwijze:paperback
Aantal pagina's:330
Uitgever:O'Reilly
Druk:1
Verschijningsdatum:25-8-2018
Hoofdrubriek:IT-management / ICT

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Inhoudsopgave

Preface Or: What Are You Getting Yourself Into Here?
Navigating This Book
Takeaways
Conventions Used in This Book
Online Resources
Figures
Code Snippets
O’Reilly Safari
How to Contact Us
Acknowledgments

I. The Beam Model
1. Streaming 101
Terminology: What Is Streaming?
On the Greatly Exaggerated Limitations of Streaming
Event Time Versus Processing Time
Data Processing Patterns
Bounded Data
Unbounded Data: Batch
Unbounded Data: Streaming
Summary

2. The What, Where, When, and How of Data Processing
Roadmap
Batch Foundations: What and Where
What: Transformations
Where: Windowing
Going Streaming: When and How
When: The Wonderful Thing About Triggers Is Triggers Are Wonderful Things!
When: Watermarks
When: Early/On-Time/Late Triggers FTW!
When: Allowed Lateness (i.e., Garbage Collection)
How: Accumulation
Summary

3. Watermarks
Definition
Source Watermark Creation
Perfect Watermark Creation
Heuristic Watermark Creation
Watermark Propagation
Understanding Watermark Propagation
Watermark Propagation and Output Timestamps
The Tricky Case of Overlapping Windows
Percentile Watermarks
Processing-Time Watermarks
Case Studies
Case Study: Watermarks in Google Cloud Dataflow
Case Study: Watermarks in Apache Flink
Case Study: Source Watermarks for Google Cloud Pub/Sub
Summary

4. Advanced Windowing
When/Where: Processing-Time Windows
Event-Time Windowing
Processing-Time Windowing via Triggers
Processing-Time Windowing via Ingress Time
Where: Session Windows
Where: Custom Windowing
Variations on Fixed Windows
Variations on Session Windows
One Size Does Not Fit All
Summary

5. Exactly-Once and Side Effects
Why Exactly Once Matters
Accuracy Versus Completeness
Side Effects
Problem Definition
Ensuring Exactly Once in Shuffle
Addressing Determinism
Performance
Graph Optimization
Bloom Filters
Garbage Collection
Exactly Once in Sources
Exactly Once in Sinks
Use Cases
Example Source: Cloud Pub/Sub
Example Sink: Files
Example Sink: Google BigQuery
Other Systems
Apache Spark Streaming
Apache Flink
Summary

II. Streams and Tables
6. Streams and Tables
Stream-and-Table Basics Or: a Special Theory of Stream and Table Relativity
Toward a General Theory of Stream and Table Relativity
Batch Processing Versus Streams and Tables
A Streams and Tables Analysis of MapReduce
Reconciling with Batch Processing
What, Where, When, and How in a Streams and Tables World
What: Transformations
Where: Windowing
When: Triggers
How: Accumulation
A Holistic View of Streams and Tables in the Beam Model
A General Theory of Stream and Table Relativity
Summary

7. The Practicalities of Persistent State
Motivation
The Inevitability of Failure
Correctness and Efficiency
Implicit State
Raw Grouping
Incremental Combining
Generalized State
Case Study: Conversion Attribution
Conversion Attribution with Apache Beam
Summary

8. Streaming SQL
What Is Streaming SQL?
Relational Algebra
Time-Varying Relations
Streams and Tables
Looking Backward: Stream and Table Biases
The Beam Model: A Stream-Biased Approach
The SQL Model: A Table-Biased Approach
Looking Forward: Toward Robust Streaming SQL
Stream and Table Selection
Temporal Operators
Summary

9. Streaming Joins
All Your Joins Are Belong to Streaming
Unwindowed Joins
FULL OUTER
LEFT OUTER
RIGHT OUTER
INNER
ANTI
SEMI
Windowed Joins
Fixed Windows
Temporal Validity
Summary

10. The Evolution of Large-Scale Data Processing
MapReduce
Hadoop
Flume
Storm
Spark
MillWheel
Kafka
Cloud Dataflow
Flink
Beam
Summary

Index

Managementboek Top 100

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Streaming Systems