Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20
,

Building Real–Time Analytics Systems

From Events to Insights with Apache Kafka and Apache Pinot

Paperback Engels 2023 1e druk 9781098138790
Verkooppositie 4993Hoogste positie: 4993
Verwachte levertijd ongeveer 16 werkdagen

Samenvatting

Gain deep insight into real-time analytics, including the features of these systems and the problems they solve. With this practical book, data engineers at organizations that use event-processing systems such as Kafka, Google Pub/Sub, and AWS Kinesis will learn how to analyze data streams in real time. The faster you derive insights, the quicker you can spot changes in your business and act accordingly.

Author Mark Needham from StarTree provides an overview of the real-time analytics space and an understanding of what goes into building real-time applications. The book's second part offers a series of hands-on tutorials that show you how to combine multiple software products to build real-time analytics applications for an imaginary pizza delivery service.

You will:
- Learn common architectures for real-time analytics
- Discover how event processing differs from real-time analytics
- Ingest event data from Apache Kafka into Apache Pinot
- Combine event streams with OLTP data using Debezium and Kafka Streams
- Write real-time queries against event data stored in Apache Pinot
- Build a real-time dashboard and order tracking app
- Learn how Uber, Stripe, and Just Eat use real-time analytics

Specificaties

ISBN13:9781098138790
Taal:Engels
Bindwijze:paperback
Aantal pagina's:250
Uitgever:O'Reilly
Druk:1
Verschijningsdatum:29-9-2023
Hoofdrubriek:IT-management / ICT
ISSN:

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Inhoudsopgave

Foreword
Preface
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments

1. Introduction to Real-Time Analytics
What Is an Event Stream?
Making Sense of Streaming Data
What Is Real-Time Analytics?
Benefits of Real-Time Analytics
New Revenue Streams
Timely Access to Insights
Reduced Infrastructure Cost
Improved Overall Customer Experience
Real-Time Analytics Use Cases
User-Facing Analytics
Personalization
Metrics
Anomaly Detection and Root Cause Analysis
Visualization
Ad Hoc Analytics
Log Analytics/Text Search
Classifying Real-Time Analytics Applications
Internal Versus External Facing
Machine Versus Human Facing
Summary

2. The Real-Time Analytics Ecosystem
Defining the Real-Time Analytics Ecosystem
The Classic Streaming Stack
Complex Event Processing
The Big Data Era
The Modern Streaming Stack
Event Producers
Streaming Data Platform
Stream Processing Layer
Serving Layer
Frontend
Summary

3. Introducing All About That Dough: Real-Time Analytics on Pizza
Existing Architecture
Setup
MySQL
Apache Kafka
ZooKeeper
Orders Service
Spinning Up the Components
Inspecting the Data
Applications of Real-Time Analytics
Summary

4. Querying Kafka with Kafka Streams
What Is Kafka Streams?
What Is Quarkus?
Quarkus Application
Installing the Quarkus CLI
Creating a Quarkus Application
Creating a Topology
Querying the Key-Value Store
Creating an HTTP Endpoint
Running the Application
Querying the HTTP Endpoint
Limitations of Kafka Streams
Summary

5. The Serving Layer: Apache Pinot
Why Can’t We Use Another Stream Processor?
Why Can’t We Use a Data Warehouse?
What Is Apache Pinot?
How Does Pinot Model and Store Data?
Schema
Table
Setup
Data Ingestion
Pinot Data Explorer
Indexes
Updating the Web App
Summary

6. Building a Real-Time Analytics Dashboard
Dashboard Architecture
What Is Streamlit?
Setup
Building the Dashboard
Summary

7. Product Changes Captured with Change Data Capture
Capturing Changes from Operational Databases
Change Data Capture
Why Do We Need CDC?
What Is CDC?
What Are the Strategies for Implementing CDC?
Log-Based Data Capture
Requirements for a CDC System
Debezium
Applying CDC to AATD
Setup
Connecting Debezium to MySQL
Querying the Products Stream
Updating Products
Summary

8. Joining Streams with Kafka Streams
Enriching Orders with Kafka Streams
Adding Order Items to Pinot
Updating the Orders Service
Refreshing the Streamlit Dashboard
Summary
9. Upserts in the Serving Layer
Order Statuses
Enriched Orders Stream
Upserts in Apache Pinot
Updating the Orders Service
Creating UsersResource
Adding an allUsers Endpoint
Adding an Orders for User Endpoint
Adding an Individual Order Endpoint
Configuring Cross-Origin Resource Sharing
Frontend App
Order Statuses on the Dashboard
Time Spent in Each Order Status
Orders That Might Be Stuck
Summary

10. Geospatial Querying
Delivery Statuses
Updating Apache Pinot
Orders
Delivery Statuses
Updating the Orders Service
Individual Orders
Delayed Orders by Area
Consuming the New API Endpoints
Summary

11. Production Considerations
Preproduction
Capacity Planning
Data Partitioning
Throughput
Data Retention
Data Granularity
Total Data Size
Replication Factor
Deployment Platform
In-House Skills
Data Privacy and Security
Cost
Control
Postproduction
Monitoring and Alerting
Data Governance
Summary

12. Real-Time Analytics in the Real World
Content Recommendation (Professional Social Network)
The Problem
The Solution
Benefits
Operational Analytics (Streaming Service)
The Problem
The Solution
Benefits
Real-Time Ad Analytics (Online Marketplace)
The Problem
The Solution
Benefits
User-Facing Analytics (Collaboration Platform)
The Problem
The Solution
Benefits
Summary

13. The Future of Real-Time Analytics
Edge Analytics
Compute-Storage Separation
Data Lakehouses
Real-Time Data Visualization
Streaming Databases
Streaming Data Platform as a Service
Reverse ETL
Summary

Index
About the Author

Managementboek Top 100

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Building Real–Time Analytics Systems