Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20

Building Medallion Architectures

Designing with Delta Lake and Spark

Paperback Engels 2025 1e druk 9781098178833
Verkooppositie 1850Hoogste positie: 1850
Verwachte levertijd ongeveer 16 werkdagen

Samenvatting

To deliver the insights that give them a competitive advantage, organizations increasingly turn to the proven Medallion architecture. Yet implementing a robust data architecture can be difficult, particularly when it comes to using the Medallion architecture's Bronze, Silver, and Gold layers—done wrong, it can hamper your ability to make data-driven decisions. This practical guide helps you build a Medallion architecture the right way with Azure Databricks and Microsoft Fabric.

Drawing on hands-on experience from the field, Piethein Strengholt demystifies common assumptions and complex problems you'll face when embarking on a new data architecture. Architects and engineers of all stripes will find answers to the most typical questions along with insights from real organizations about what's worked, what hasn't, and why. You'll learn:Learn how to build a Medallion architecture with Azure Databricks and Microsoft FabricGain insights from three real case studies that illustrate practical field experience and lessons learnedExplore scaling considerations, including governance, security, generative AI, and moreMake informed decisions when designing or implementing new data architecturesGet proven patterns for success that align with broader organizational objectives

Specificaties

ISBN13:9781098178833
Taal:Engels
Bindwijze:paperback
Aantal pagina's:200
Uitgever:O'Reilly
Druk:1
Verschijningsdatum:11-4-2025
Hoofdrubriek:IT-management / ICT

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Over Piethein Strengholt

Piethein Strengholt likes to find practical and lasting solutions to complex problems. After working for more than a decade as a strategy consultant and freelance application developer, he joined ABN AMRO as a principal architect to accelerate subjects like data management, cloud, and integration. In this exciting role, he oversees the company’s data strategy and its impact on the organization. He lives in the Netherlands with his family.

Andere boeken door Piethein Strengholt

Inhoudsopgave

Foreword
Preface
Who Should Read This Book
Navigating This Book
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments

I. Understanding the Medallion Framework
1. The Evolution of Data Architecture
What Is a Medallion Architecture?
A Brief History of Data Warehouse Architecture
OLTP Systems
Data Warehouses
The Staging Area
Inmon Methodology
Kimball Methodology
Key Takeaways from Traditional Data Warehouses
A Brief History of Data Lakes
Hadoop’s Distributed File System
MapReduce
Apache Hive
Spark Project
Moving Forward with Data Lakes
A Brief History of Lakehouse Architecture
Founders of Spark
Emergence of Open Table Formats
The Rise of Lakehouse Architectures
Medallion Architecture and Its Practical Challenges
Conclusion

2. Laying the Groundwork
Foundational Preconditions
Extra Landing Zones
Raw Data
Batch Processing
Real-Time Data Processing
Spark Structured Streaming
Change Data Feed
Change Data Capture
Considerations and Learning Resources
ETL and Orchestration Tools
Managing Delta Tables
Z-Ordering
V-Ordering
Table Partitioning
Liquid Clustering
Compaction and Optimized Writes
DeltaLog
Conclusion

3. Demystifying the Medallion Architecture
The Three-Layered Design
Bronze Layer
Processing Hierarchy
Processing Full Data Loads
Processing Incremental Data Loads
Data Historization Within the Bronze Layer
Schema Evolution and Management
MergeSchema and Schema Enforcement
Technical Validation Checks
Usage and Governance
The Bronze Layer in Practice
Silver Layer
Cleaning Data Activities
Designing the Silver Layer’s Data Model
Harmonization with Other Sources
3NF and Data Vault
Operational Querying and Machine Learning
Managing Overlapping Requirements
Automation Tasks
The Silver Layer in Practice
Gold Layer
Star Schema
Star Schema Design Nuances
Curated, Semantic, and Platinum Layers
One-Big-Table Design
Serving Layer
The Gold Layer in Practice
Conclusion

II. Crafting the Medallion Layers
4. Building a Medallion Foundation with Microsoft Fabric
Our Case Study: Oceanic Airlines
Introducing Microsoft Fabric
Domains
Workspaces and Capacities
OneLake
Data Engineering with Spark
Data Warehousing with T-SQL
Other Fabric Workload Types
Setting Up the Foundation
Setting up Capacities
Setting up Domains
Setting up Workspaces
Creating Lakehouses
Capacity Considerations
Domain Considerations
Workspace Considerations
Lakehouse Entities Considerations
Storage Account Considerations
Conclusion

5. Construct the Bronze Layer
Building the Data Pipeline
Deploying the AdventureWorks Sample Database
Set Up an Azure SQL Database Connection
Creating a New Data Pipeline
Additional Considerations
Implementation of Lakehouse Tables
Traverse Parquet Files to Managed Delta Tables
Using External Tables
Updating Tables with MERGE Operations
Spark Structured Streaming
Using Change Data Capture
Navigating Data Handling Techniques
Schema Management
Create Tables Without Defining Schemas
Define Schemas with the DataFrame API
SQL DDL Statements
YAML or JSON Configurations
Metadata-Driven Approach
Databricks Auto Loader
Third-Party Tools
Handling Schema Evolution
Conclusion

6. Build the Silver Layer
Quick Recap
Implementation of a Metadata-Driven Approach
Implementation of the Metadata Store
Implementation of Dynamic Data Validations
Improvement Areas
Data Cleansing
Implementation of Data Cleansing Tasks
Data Cleansing Considerations
Data Transformation Frameworks and Data Quality Tools
Optimization of Query Performance with Denormalization
Lightweight Enrichments
Data Historization
Optimization Jobs
Orchestration with Apache AirFlow
Final Recommendations
Silver-Layer Data as a Product
Conclusion
7. Streamline the Gold Layer
Design of the Gold Layer
Transform Data Using a Star Schema
Creation of the Semantic Model
Creation of the First Power BI Report
Creation of Task Flows
Enhancements for Gold-Layer Design
Microsoft Fabric in Practice
Data Products
Data Governance with Microsoft Purview
Microsoft Purview Design Considerations
Guidance for Medallion Architectures
Conclusion

III. Real-World Case Studies
8. Case Study: Data, Analytics and Business Strategy at AP Pension
Medallion Architecture
Other Considerations
Final Recommendations

9. Case Study: Amadeus, a Tech Leader in the Travel Industry
Medallion Architecture
FinOps
Data Models
Data Contracts
Data Governance

10. Case Study: Strategic Data Transformation at ZEISS
Data Platform Evolution
Medallion Architecture
Data Products and Sharing
Recommendations and Best Practices

IV. Scaling, Governance, and the Future of Medallion Architectures
11. Scaling the Medallion Architecture
Decentralization of Data Management
Flexibility in Federation
Medallion Mesh
Number of Medallion Architectures
Medallion Inner Architecture Variations
Separate Data Product Layers
Tailored Medallions Architectures
Adaptability of the Bronze Layer
Silver Layer Variations
Gold Layer Variations
Enterprise Data Models
Master Data Management
Reference Data Management
Conclusion

12. Medallion Governance and Security
Data Governance
Governance Within a Medallion Architecture
Unity Catalog
Medallion Architecture with Unity Catalog
Data Contracts
Contracts Within a Catalog
Contracts Within a Metastore
Data Contracts Using YAML Files and GitOps
Other Data Contract Specifications
Data Security and Access Management
Conclusion

13. Future Medallion Architectures with Generative AI
Unstructured Data Processing
Retrieval-Augmented Generation
Bronze Layer
Silver Layer
Gold Layer
Integration of LLMs and Medallion Architectures
Role of Agents
Training and Fine-Tuning LLMs
Future of Medallion Architectures
Conclusion

Index
About the Author

Managementboek Top 100

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Building Medallion Architectures