Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20
, , , , , e.a.

Building Secure and Reliable Systems

Best Practices for Designing, Implementing, and Maintaining Systems

Paperback Engels 2020 9781492083122
Verkooppositie 7179
Verwachte levertijd ongeveer 8 werkdagen


Can a system be considered truly reliable if it isn't fundamentally secure? Or can it be considered secure if it's unreliable? Security is crucial to the design and operation of scalable systems in production, as it plays an important part in product quality, performance, and availability. In this book, experts from Google share best practices to help your organization design scalable and reliable systems that are fundamentally secure.

Two previous O’Reilly books from Google—Site Reliability Engineering and The Site Reliability Workbook—demonstrated how and why a commitment to the entire service lifecycle enables organizations to successfully build, deploy, monitor, and maintain software systems. In this latest guide, the authors offer insights into system design, implementation, and maintenance from practitioners who specialize in security and reliability. They also discuss how building and adopting their recommended best practices requires a culture that’s supportive of such change.

You’ll learn about secure and reliable systems through:
Design strategies Recommendations for coding, testing, and debugging practices Strategies to prepare for, respond to, and recover from incidents Cultural best practices that help teams across your organization collaborate effectively


Aantal pagina's:600
Hoofdrubriek:IT-management / ICT


Wees de eerste die een lezersrecensie schrijft!

Geef uw waardering

Zeer goed Goed Voldoende Matig Slecht

Over Piotr Lewandowski

Piotr Lewandowski is a Senior Staff Site Reliability Engineer, and has spent the past nine years improving the security posture of Google’s infrastructure. As the Production Tech Lead for Security, he is responsible for harmonious collaboration between the SRE and security organizations. In his previous role, he led a team responsible for the reliability of Google’s critical security infrastructure. Before joining Google, he built a startup, worked at CERT Polska, and got a degree in computer science from Warsaw University of Technology.

Andere boeken door Piotr Lewandowski

Over Adam Stubblefield

Adam Stubblefield is a Distinguished Engineer and the Horizontal Lead for Security at Google. Over the past 8 years, he’s led teams that have built much of Google’s core security infrastructure. Adam has a PhD in Computer Science from Johns Hopkins.

Andere boeken door Adam Stubblefield

Over Paul Blankinship

Paul Blankinship manages the Technical Writing team for Google’s Security and Privacy Engineering group. He’s previously written documentation for Google Web Designer, and helped develop Google’s internal security and privacy policies.

Andere boeken door Paul Blankinship

Over Heather Adkins

Heather Adkins is a 17-year Google veteran and founding member of the Google Security Team. As Sr Director of Information Security, she has built a global team responsible for maintaining the safety and security of Google’s networks, systems and applications. She has an extensive background in systems and network administration with an emphasis on practical security, and has worked to build and secure some of the world’s largest infrastructure. She now focuses her time primarily on the defense of Google’s computing infrastructure and working with industry to tackle some of the greatest security challenges.

Andere boeken door Heather Adkins


Foreword by Royal Hansen
Foreword by Michael Wildpaner
Why We Wrote This Book
Who This Book Is For
A Note About Culture
How to Read This Book
Conventions Used in This Book
O’Reilly Online Learning
How to Contact Us

I: Introductory Material
1. The Intersection of Security and Reliability
On Passwords and Power Drills
Reliability Versus Security: Design Considerations
Confidentiality, Integrity, Availability
Reliability and Security: Commonalities
From Design to Production
Investigating Systems and Logging
Crisis Response

2. Understanding Adversaries
Attacker Motivations
Attacker Profiles
Vulnerability Researchers
Governments and Law Enforcement
Criminal Actors
Automation and Artificial Intelligence
Attacker Methods
Threat Intelligence
Cyber Kill Chains™
Tactics, Techniques, and Procedures
Risk Assessment Considerations

II: Designing Systems
3. Case Study: Safe Proxies
Safe Proxies in Production Environments
Google Tool Proxy

4. Design Tradeoffs
Design Objectives and Requirements
Feature Requirements
Nonfunctional Requirements
Features Versus Emergent Properties
Example: Google Design Document
Balancing Requirements
Example: Payment Processing
Managing Tensions and Aligning Goals
Example: Microservices and the Google Web Application Framework
Aligning Emergent-Property Requirements
Initial Velocity Versus Sustained Velocity

5. Design for Least Privilege
Concepts and Terminology
Least Privilege
Zero Trust Networking
Zero Touch
Classifying Access Based on Risk
Best Practices
Small Functional APIs
Testing and Least Privilege
Diagnosing Access Denials
Graceful Failure and Breakglass Mechanisms
Worked Example: Configuration Distribution
Software Update API
Custom OpenSSH ForceCommand
Custom HTTP Receiver (Sidecar)
Custom HTTP Receiver (In-Process)
A Policy Framework for Authentication and Authorization Decisions
Using Advanced Authorization Controls
Investing in a Widely Used Authorization Framework
Avoiding Potential Pitfalls
Advanced Controls
Multi-Party Authorization (MPA)
Three-Factor Authorization (3FA)
Business Justifications
Temporary Access
Tradeoffs and Tensions
Increased Security Complexity
Impact on Collaboration and Company Culture
Quality Data and Systems That Impact Security
Impact on User Productivity
Impact on Developer Complexity

6. Design for Understandability
Why Is Understandability Important?
System Invariants
Analyzing Invariants
Mental Models
Designing Understandable Systems
Complexity Versus Understandability
Breaking Down Complexity
Centralized Responsibility for Security and Reliability Requirements
System Architecture
Understandable Interface Specifications
Understandable Identities, Authentication, and Access Control
Security Boundaries
Software Design
Using Application Frameworks for Service-Wide Requirements
Understanding Complex Data Flows
Considering API Usability

7. Design for a Changing Landscape
Types of Security Changes
Designing Your Change
Architecture Decisions to Make Changes Easier
Keep Dependencies Up to Date and Rebuild Frequently
Release Frequently Using Automated Testing
Use Containers
Use Microservices
Different Changes: Different Speeds, Different Timelines
Short-Term Change: Zero-Day Vulnerability
Medium-Term Change: Improvement to Security Posture
Long-Term Change: External Demand
Complications: When Plans Change
Example: Growing Scope—Heartbleed

8. Design for Resilience
Design Principles for Resilience
Defense in Depth
The Trojan Horse
Google App Engine Analysis
Controlling Degradation
Differentiate Costs of Failures
Deploy Response Mechanisms
Automate Responsibly
Controlling the Blast Radius
Role Separation
Location Separation
Time Separation
Failure Domains and Redundancies
Failure Domains
Component Types
Controlling Redundancies
Continuous Validation
Validation Focus Areas
Validation in Practice
Practical Advice: Where to Begin

9. Design for Recovery
What Are We Recovering From?
Random Errors
Accidental Errors
Software Errors
Malicious Actions
Design Principles for Recovery
Design to Go as Quickly as Possible (Guarded by Policy)
Limit Your Dependencies on External Notions of Time
Rollbacks Represent a Tradeoff Between Security and Reliability
Use an Explicit Revocation Mechanism
Know Your Intended State, Down to the Bytes
Design for Testing and Continuous Validation
Emergency Access
Access Controls
Responder Habits
Unexpected Benefits

10. Mitigating Denial-of-Service Attacks
Strategies for Attack and Defense
Attacker’s Strategy
Defender’s Strategy
Designing for Defense
Defendable Architecture
Defendable Services
Mitigating Attacks
Monitoring and Alerting
Graceful Degradation
A DoS Mitigation System
Strategic Response
Dealing with Self-Inflicted Attacks
User Behavior
Client Retry Behavior

III: Implementing Systems
11. Case Study: Designing, Implementing, and Maintaining a Publicly Trusted CA
Background on Publicly Trusted Certificate Authorities
Why Did We Need a Publicly Trusted CA?
The Build or Buy Decision
Design, Implementation, and Maintenance Considerations
Programming Language Choice
Complexity Versus Understandability
Securing Third-Party and Open Source Components
Resiliency for the CA Key Material
Data Validation

12. Writing Code
Frameworks to Enforce Security and Reliability
Benefits of Using Frameworks
Example: Framework for RPC Backends
Common Security Vulnerabilities
SQL Injection Vulnerabilities: TrustedSqlString
Preventing XSS: SafeHtml
Lessons for Evaluating and Building Frameworks
Simple, Safe, Reliable Libraries for Common Tasks
Rollout Strategy
Simplicity Leads to Secure and Reliable Code
Avoid Multilevel Nesting
Eliminate YAGNI Smells
Repay Technical Debt
Security and Reliability by Default
Choose the Right Tools
Use Strong Types
Sanitize Your Code

13. Testing Code
Unit Testing
Writing Effective Unit Tests
When to Write Unit Tests
How Unit Testing Affects Code
Integration Testing
Writing Effective Integration Tests
Dynamic Program Analysis
Fuzz Testing
How Fuzz Engines Work
Writing Effective Fuzz Drivers
An Example Fuzzer
Continuous Fuzzing
Static Program Analysis
Automated Code Inspection Tools
Integration of Static Analysis in the Developer Workflow
Abstract Interpretation
Formal Methods

14. Deploying Code
Concepts and Terminology
Threat Model
Best Practices
Require Code Reviews
Rely on Automation
Verify Artifacts, Not Just People
Treat Configuration as Code
Securing Against the Threat Model
Advanced Mitigation Strategies
Binary Provenance
Provenance-Based Deployment Policies
Verifiable Builds
Deployment Choke Points
Post-Deployment Verification
Practical Advice
Take It One Step at a Time
Provide Actionable Error Messages
Ensure Unambiguous Provenance
Create Unambiguous Policies
Include a Deployment Breakglass
Securing Against the Threat Model, Revisited

15. Investigating Systems
From Debugging to Investigation
Example: Temporary Files
Debugging Techniques
What to Do When You’re Stuck
Collaborative Debugging: A Way to Teach
How Security Investigations and Debugging Differ
Collect Appropriate and Useful Logs
Design Your Logging to Be Immutable
Take Privacy into Consideration
Determine Which Security Logs to Retain
Budget for Logging
Robust, Secure Debugging Access

IV: Maintaining Systems
16. Disaster Planning
Defining “Disaster”
Dynamic Disaster Response Strategies
Disaster Risk Analysis
Setting Up an Incident Response Team
Identify Team Members and Roles
Establish a Team Charter
Establish Severity and Priority Models
Define Operating Parameters for Engaging the IR Team
Develop Response Plans
Create Detailed Playbooks
Ensure Access and Update Mechanisms Are in Place
Prestaging Systems and People Before an Incident
Configuring Systems
Processes and Procedures
Testing Systems and Response Plans
Auditing Automated Systems
Conducting Nonintrusive Tabletops
Testing Response in Production Environments
Red Team Testing
Evaluating Responses
Google Examples
Test with Global Impact
DiRT Exercise Testing Emergency Access
Industry-Wide Vulnerabilities

17. Crisis Management
Is It a Crisis or Not?
Triaging the Incident
Compromises Versus Bugs
Taking Command of Your Incident
The First Step: Don’t Panic!
Beginning Your Response
Establishing Your Incident Team
Operational Security
Trading Good OpSec for the Greater Good
The Investigative Process
Keeping Control of the Incident
Parallelizing the Incident
Keeping the Right People Informed with the Right Levels of Detail
Putting It All Together
Declaring an Incident
Communications and Operational Security
Beginning the Incident
Handing Back the Incident
Preparing Communications and Remediation

18. Recovery and Aftermath
Recovery Logistics
Recovery Timeline
Planning the Recovery
Scoping the Recovery
Recovery Considerations
Recovery Checklists
Initiating the Recovery
Isolating Assets (Quarantine)
System Rebuilds and Software Upgrades
Data Sanitization
Recovery Data
Credential and Secret Rotation
After the Recovery
Compromised Cloud Instances
Large-Scale Phishing Attack
Targeted Attack Requiring Complex Recovery

V: Organization and Culture
19. Case Study: Chrome Security Team
Background and Team Evolution
Security Is a Team Responsibility
Help Users Safely Navigate the Web
Speed Matters
Design for Defense in Depth
Be Transparent and Engage the Community

20. Understanding Roles and Responsibilities
Who Is Responsible for Security and Reliability?
The Roles of Specialists
Understanding Security Expertise
Certifications and Academia
Integrating Security into the Organization
Embedding Security Specialists and Security Teams
Example: Embedding Security at Google
Special Teams: Blue and Red Teams
External Researchers

21. Building a Culture of Security and Reliability
Defining a Healthy Security and Reliability Culture
Culture of Security and Reliability by Default
Culture of Review
Culture of Awareness
Culture of Yes
Culture of Inevitably
Culture of Sustainability
Changing Culture Through Good Practice
Align Project Goals and Participant Incentives
Reduce Fear with Risk-Reduction Mechanisms
Make Safety Nets the Norm
Increase Productivity and Usability
Overcommunicate and Be Transparent
Build Empathy
Convincing Leadership
Understand the Decision-Making Process
Build a Case for Change
Pick Your Battles
Escalations and Problem Resolution

A. A Disaster Risk Assessment Matrix


Alle 100 bestsellers


Populaire producten



        Building Secure and Reliable Systems