Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20

Deciphering Data Architectures

Choosing Between a Modern Data Warehouse, Data Fabric, Data Lakehouse, and Data Mesh

Paperback Engels 2024 1e druk 9781098150761
Verkooppositie 5046Hoogste positie: 5046
Verwachte levertijd ongeveer 16 werkdagen

Samenvatting

Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of each architecture to help data professionals understand its pros and cons.

In the process, James Serra, big data and data warehousing solution architect at Microsoft, examines common data architecture concepts, including how data warehouses have had to evolve to work with data lake features. You'll learn what data lakehouses can help you achieve, and how to distinguish data mesh hype from reality. Best of all, you'll be able to determine the most appropriate data architecture for your needs.

By reading this book, you'll:
- Gain a working understanding of several data architectures
- Know the pros and cons of each approach
- Distinguish data architecture theory from the reality
- Learn to pick the best architecture for your use case
- Understand the differences between data warehouses and data lakes
- Learn common data architecture concepts to help you build better solutions
- Alleviate confusion by clearly defining each data architecture
- Know what architectures to use for each cloud provider

Specificaties

ISBN13:9781098150761
Taal:Engels
Bindwijze:paperback
Aantal pagina's:225
Uitgever:O'Reilly
Druk:1
Verschijningsdatum:26-2-2024
Hoofdrubriek:IT-management / ICT
ISSN:

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Inhoudsopgave

Foreword
Preface
Conventions Used in This Book
O’Reilly Online Learning
How to Contact Us
Acknowledgments

I. Foundation
1. Big Data
What Is Big Data, and How Can It Help You?
Data Maturity
Stage 1: Reactive
Stage 2: Informative
Stage 3: Predictive
Stage 4: Transformative
Self-Service Business Intelligence
Summary

2. Types of Data Architectures
Evolution of Data Architectures
Relational Data Warehouse
Data Lake
Modern Data Warehouse
Data Fabric
Data Lakehouse
Data Mesh
Summary

3. The Architecture Design Session
What Is an ADS?
Why Hold an ADS?
Before the ADS
Preparing
Inviting Participants
Conducting the ADS
Introductions
Discovery
Whiteboarding
After the ADS
Tips for Conducting an ADS
Summary

II. Common Data Architecture Concepts
4. The Relational Data Warehouse
What Is a Relational Data Warehouse?
What a Data Warehouse Is Not
The Top-Down Approach
Why Use a Relational Data Warehouse?
Drawbacks to Using a Relational Data Warehouse
Populating a Data Warehouse
How Often to Extract the Data
Extraction Methods
How to Determine What Data Has Changed Since the Last Extraction
The Death of the Relational Data Warehouse Has Been Greatly Exaggerated
Summary

5. Data Lake
What Is a Data Lake?
Why Use a Data Lake?
Bottom-Up Approach
Best Practices for Data Lake Design
Multiple Data Lakes
Advantages
Disadvantages
Summary

6. Data Storage Solutions and Processes
Data Storage Solutions
Data Marts
Operational Data Stores
Data Hubs
Data Processes
Master Data Management
Data Virtualization and Data Federation
Data Catalogs
Data Marketplaces
Summary

7. Approaches to Design
Online Transaction Processing Versus Online Analytical Processing
Operational and Analytical Data
Symmetric Multiprocessing and Massively Parallel Processing
Lambda Architecture
Kappa Architecture
Polyglot Persistence and Polyglot Data Stores
Summary

8. Approaches to Data Modeling
Relational Modeling
Keys
Entity–Relationship Diagrams
Normalization Rules and Forms
Tracking Changes
Dimensional Modeling
Facts, Dimensions, and Keys
Tracking Changes
Denormalization
Common Data Model
Data Vault
The Kimball and Inmon Data Warehousing Methodologies
Inmon’s Top-Down Methodology
Kimball’s Bottom-Up Methodology
Choosing a Methodology
Hybrid Models
Methodology Myths
Summary

9. Approaches to Data Ingestion
ETL Versus ELT
Reverse ETL
Batch Processing Versus Real-Time Processing
Batch Processing Pros and Cons
Real-Time Processing Pros and Cons
Data Governance
Summary

III. Data Architectures
10. The Modern Data Warehouse
The MDW Architecture
Pros and Cons of the MDW Architecture
Combining the RDW and Data Lake
Data Lake
Relational Data Warehouse
Stepping Stones to the MDW
EDW Augmentation
Temporary Data Lake Plus EDW
All-in-One
Case Study: Wilson & Gunkerk’s Strategic Shift to an MDW
Challenge
Solution
Outcome
Summary

11. Data Fabric
The Data Fabric Architecture
Data Access Policies
Metadata Catalog
Master Data Management
Data Virtualization
Real-Time Processing
APIs
Services
Products
Why Transition from an MDW to a Data Fabric Architecture?
Potential Drawbacks
Summary

12. Data Lakehouse
Delta Lake Features
Performance Improvements
The Data Lakehouse Architecture
What If You Skip the Relational Data Warehouse?
Relational Serving Layer
Summary

13. Data Mesh Foundation
A Decentralized Data Architecture
Data Mesh Hype
Dehghani’s Four Principles of Data Mesh
Principle #1: Domain Ownership
Principle #2: Data as a Product
Principle #3: Self-Serve Data Infrastructure as a Platform
Principle #4: Federated Computational Governance
The “Pure” Data Mesh
Data Domains
Data Mesh Logical Architecture
Different Topologies
Data Mesh Versus Data Fabric
Use Cases
Summary

14. Should You Adopt Data Mesh? Myths, Concerns, and the Future
Myths
Myth: Using Data Mesh Is a Silver Bullet That Solves All Data Challenges Quickly
Myth: A Data Mesh Will Replace Your Data Lake and Data Warehouse
Myth: Data Warehouse Projects Are All Failing, and a Data Mesh Will Solve That Problem
Myth: Building a Data Mesh Means Decentralizing Absolutely Everything
Myth: You Can Use Data Virtualization to Create a Data Mesh
Concerns
Philosophical and Conceptual Matters
Combining Data in a Decentralized Environment
Other Issues of Decentralization
Complexity
Duplication
Feasibility
People
Domain-Level Barriers
Organizational Assessment: Should You Adopt a Data Mesh?
Recommendations for Implementing a Successful Data Mesh
The Future of Data Mesh
Zooming Out: Understanding Data Architectures and Their Applications
Summary

IV. People, Processes, and Technology
15. People and Processes
Team Organization: Roles and Responsibilities
Roles for MDW, Data Fabric, or Data Lakehouse
Roles for Data Mesh
Why Projects Fail: Pitfalls and Prevention
Pitfall: Allowing Executives to Think That BI Is “Easy”
Pitfall: Using the Wrong Technologies
Pitfall: Gathering Too Many Business Requirements
Pitfall: Gathering Too Few Business Requirements
Pitfall: Presenting Reports Without Validating Their Contents First
Pitfall: Hiring an Inexperienced Consulting Company
Pitfall: Hiring a Consulting Company That Outsources Development to Offshore Workers
Pitfall: Passing Project Ownership Off to Consultants
Pitfall: Neglecting the Need to Transfer Knowledge Back into the Organization
Pitfall: Slashing the Budget Midway Through the Project
Pitfall: Starting with an End Date and Working Backward
Pitfall: Structuring the Data Warehouse to Reflect the Source Data Rather Than the Business’s Needs
Pitfall: Presenting End Users with a Solution with Slow Response Times or Other Performance Issues
Pitfall: Overdesigning (or Underdesigning) Your Data Architecture
Pitfall: Poor Communication Between IT and the Business Domains
Tips for Success
Don’t Skimp on Your Investment
Involve Users, Show Them Results, and Get Them Excited
Add Value to New Reports and Dashboards
Ask End Users to Build a Prototype
Find a Project Champion/Sponsor
Make a Project Plan That Aims for 80% Efficiency
Summary

16. Technologies
Choosing a Platform
Open Source Solutions
On-Premises Solutions
Cloud Provider Solutions
Cloud Service Models
Major Cloud Providers
Multi-Cloud Solutions
Software Frameworks
Hadoop
Databricks
Snowflake
Summary

Index
About the Author

Managementboek Top 100

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Deciphering Data Architectures