Observability Engineering

Name: Observability Engineering
Author: Charity Majors

Achieving Production Excellence

Paperback Engels 2022 1e druk 9781492076445

€ 75,18

In winkelwagen

Levertijd ongeveer 16 werkdagen

Gratis verzonden

Samenvatting

Observability is critical for building, changing, and understanding the software that powers complex modern systems. Teams that adopt observability are much better equipped to ship code swiftly and confidently, identify outliers and aberrant behaviors, and understand the experience of each and every user. This practical book explains the value of observable systems and shows you how to practice observability-driven development.

Authors Charity Majors, Liz Fong-Jones, and George Miranda from Honeycomb explain what constitutes good observability, show you how to improve upon what you’re doing today, and provide practical dos and don'ts for migrating from legacy tooling, such as metrics, monitoring, and log management. You’ll also learn the impact observability has on organizational culture (and vice versa).

You'll explore:
- How the concept of observability applies to managing software at scale
- The value of practicing observability when delivering complex cloud native applications and systems
- The impact observability has across the entire software development lifecycle
- How and why different functional teams use observability with service-level objectives
- How to instrument your code to help future engineers understand the code you wrote today
- How to produce quality code for context-aware system debugging and maintenance
- How data-rich analytics can help you debug elusive issues

Specificaties

ISBN13:9781492076445

Trefwoorden:datawarehouse, machine learning

Taal:Engels

Bindwijze:paperback

Aantal pagina's:275

Uitgever:O'Reilly

Druk:1

Verschijningsdatum:23-5-2022

Hoofdrubriek:IT-management / ICT

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Schrijf een recensie

Uw waardering

?

Log in om uw waardering te geven

Klik om uw waardering te geven

Inhoudsopgave

Foreword
Preface
Who This Book Is For
Why We Wrote This Book
What You Will Learn
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments

I. The Path to Observability
1. What Is Observability?
The Mathematical Definition of Observability
Applying Observability to Software Systems
Mischaracterizations About Observability for Software
Why Observability Matters Now
Is This Really the Best Way?
Why Are Metrics and Monitoring Not Enough?
Debugging with Metrics Versus Observability
The Role of Cardinality
The Role of Dimensionality
Debugging with Observability
Observability Is for Modern Systems
Conclusion

2. How Debugging Practices Differ Between Observability and Monitoring
How Monitoring Data Is Used for Debugging
Troubleshooting Behaviors When Using Dashboards
The Limitations of Troubleshooting by Intuition
Traditional Monitoring Is Fundamentally Reactive
How Observability Enables Better Debugging
Conclusion

3. Lessons from Scaling Without Observability
An Introduction to Parse
Scaling at Parse
The Evolution Toward Modern Systems
The Evolution Toward Modern Practices
Shifting Practices at Parse
Conclusion

4. How Observability Relates to DevOps, SRE, and Cloud Native
Cloud Native, DevOps, and SRE in a Nutshell
Observability: Debugging Then Versus Now
Observability Empowers DevOps and SRE Practices
Conclusion

II. Fundamentals of Observability
5. Structured Events Are the Building Blocks of Observability
Debugging with Structured Events
The Limitations of Metrics as a Building Block
The Limitations of Traditional Logs as a Building Block
Unstructured Logs
Structured Logs
Properties of Events That Are Useful in Debugging
Conclusion

6. Stitching Events into Traces
Distributed Tracing and Why It Matters Now
The Components of Tracing
Instrumenting a Trace the Hard Way
Adding Custom Fields into Trace Spans
Stitching Events into Traces
Conclusion

7. Instrumentation with OpenTelemetry
A Brief Introduction to Instrumentation
Open Instrumentation Standards
Instrumentation Using Code-Based Examples
Start with Automatic Instrumentation
Add Custom Instrumentation
Send Instrumentation Data to a Backend System
Conclusion

8. Analyzing Events to Achieve Observability
Debugging from Known Conditions
Debugging from First Principles
Using the Core Analysis Loop
Automating the Brute-Force Portion of the Core Analysis Loop
This Misleading Promise of AIOps
Conclusion

9. How Observability and Monitoring Come Together
Where Monitoring Fits
Where Observability Fits
System Versus Software Considerations
Assessing Your Organizational Needs
Exceptions: Infrastructure Monitoring That Can’t Be Ignored
Real-World Examples
Conclusion

III. Observability for Teams
10. Applying Observability Practices in Your Team
Join a Community Group
Start with the Biggest Pain Points
Buy Instead of Build
Flesh Out Your Instrumentation Iteratively
Look for Opportunities to Leverage Existing Efforts
Prepare for the Hardest Last Push
Conclusion

11. Observability-Driven Development
Test-Driven Development
Observability in the Development Cycle
Determining Where to Debug
Debugging in the Time of Microservices
How Instrumentation Drives Observability
Shifting Observability Left
Using Observability to Speed Up Software Delivery
Conclusion

12. Using Service-Level Objectives for Reliability
Traditional Monitoring Approaches Create Dangerous Alert Fatigue
Threshold Alerting Is for Known-Unknowns Only
User Experience Is a North Star
What Is a Service-Level Objective?
Reliable Alerting with SLOs
Changing Culture Toward SLO-Based Alerts: A Case Study
Conclusion

13. Acting on and Debugging SLO-Based Alerts
Alerting Before Your Error Budget Is Empty
Framing Time as a Sliding Window
Forecasting to Create a Predictive Burn Alert
The Lookahead Window
The Baseline Window
Acting on SLO Burn Alerts
Using Observability Data for SLOs Versus Time-Series Data
Conclusion

14. Observability and the Software Supply Chain
Why Slack Needed Observability
Instrumentation: Shared Client Libraries and Dimensions
Case Studies: Operationalizing the Supply Chain
Understanding Context Through Tooling
Embedding Actionable Alerting
Understanding What Changed
Conclusion

IV. Observability at Scale
15. Build Versus Buy and Return on Investment
How to Analyze the ROI of Observability
The Real Costs of Building Your Own
The Hidden Costs of Using “Free” Software
The Benefits of Building Your Own
The Risks of Building Your Own
The Real Costs of Buying Software
The Hidden Financial Costs of Commercial Software
The Hidden Nonfinancial Costs of Commercial Software
The Benefits of Buying Commercial Software
The Risks of Buying Commercial Software
Buy Versus Build Is Not a Binary Choice
Conclusion

16. Efficient Data Storage
The Functional Requirements for Observability
Time-Series Databases Are Inadequate for Observability
Other Possible Data Stores
Data Storage Strategies
Case Study: The Implementation of Honeycomb’s Retriever
Partitioning Data by Time
Storing Data by Column Within Segments
Performing Query Workloads
Querying for Traces
Querying Data in Real Time
Making It Affordable with Tiering
Making It Fast with Parallelism
Dealing with High Cardinality
Scaling and Durability Strategies
Notes on Building Your Own Efficient Data Store
Conclusion

17. Cheap and Accurate Enough: Sampling
Sampling to Refine Your Data Collection
Using Different Approaches to Sampling
Constant-Probability Sampling
Sampling on Recent Traffic Volume
Sampling Based on Event Content (Keys)
Combining per Key and Historical Methods
Choosing Dynamic Sampling Options
When to Make a Sampling Decision for Traces
Translating Sampling Strategies into Code
The Base Case
Fixed-Rate Sampling
Recording the Sample Rate
Consistent Sampling
Target Rate Sampling
Having More Than One Static Sample Rate
Sampling by Key and Target Rate
Sampling with Dynamic Rates on Arbitrarily Many Keys
Putting It All Together: Head and Tail per Key Target Rate Sampling
Conclusion

18. Telemetry Management with Pipelines
Attributes of Telemetry Pipelines
Routing
Security and Compliance
Workload Isolation
Data Buffering
Capacity Management
Data Filtering and Augmentation
Data Transformation
Ensuring Data Quality and Consistency
Managing a Telemetry Pipeline: Anatomy
Challenges When Managing a Telemetry Pipeline
Performance
Correctness
Availability
Reliability
Isolation
Data Freshness
Use Case: Telemetry Management at Slack
Metrics Aggregation
Logs and Trace Events
Open Source Alternatives
Managing a Telemetry Pipeline: Build Versus Buy
Conclusion

V. Spreading Observability Culture
19. The Business Case for Observability
The Reactive Approach to Introducing Change
The Return on Investment of Observability
The Proactive Approach to Introducing Change
Introducing Observability as a Practice
Using the Appropriate Tools
Instrumentation
Data Storage and Analytics
Rolling Out Tools to Your Teams
Knowing When You Have Enough Observability
Conclusion

20. Observability’s Stakeholders and Allies
Recognizing Nonengineering Observability Needs
Creating Observability Allies in Practice
Customer Support Teams
Customer Success and Product Teams
Sales and Executive Teams
Using Observability Versus Business Intelligence Tools
Query Execution Time
Accuracy
Recency
Structure
Time Windows
Ephemerality
Using Observability and BI Tools Together in Practice
Conclusion

21. An Observability Maturity Model
A Note About Maturity Models
Why Observability Needs a Maturity Model
About the Observability Maturity Model
Capabilities Referenced in the OMM
Respond to System Failure with Resilience
Deliver High-Quality Code
Manage Complexity and Technical Debt
Release on a Predictable Cadence
Understand User Behavior
Using the OMM for Your Organization
Conclusion

22. Where to Go from Here
Observability, Then Versus Now
Additional Resources
Predictions for Where Observability Is Going

Index
About the Authors