Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20

Deep Learning at Scale

At the Intersection of Hardware, Software, and Data

Paperback Engels 2024 1e druk 9781098145286
Verkooppositie 3794Hoogste positie: 3794
Verwachte levertijd ongeveer 16 werkdagen

Samenvatting

Bringing a deep-learning project into production at scale is quite challenging. To successfully scale your project, a foundational understanding of full stack deep learning, including the knowledge that lies at the intersection of hardware, software, data, and algorithms, is required.

This book illustrates complex concepts of full stack deep learning and reinforces them through hands-on exercises to arm you with tools and techniques to scale your project. A scaling effort is only beneficial when it's effective and efficient. To that end, this guide explains the intricate concepts and techniques that will help you scale effectively and efficiently.

You'll gain a thorough understanding of:
- How data flows through the deep-learning network and the role the computation graphs play in building your model
- How accelerated computing speeds up your training and how best you can utilize the resources at your disposal
- How to train your model using distributed training paradigms, i.e., data, model, and pipeline parallelism
- How to leverage PyTorch ecosystems in conjunction with NVIDIA libraries and Triton to scale your model training
- Debugging, monitoring, and investigating the undesirable bottlenecks that slow down your model training
- How to expedite the training lifecycle and streamline your feedback loop to iterate model development
- A set of data tricks and techniques and how to apply them to scale your training model
- How to select the right tools and techniques for your deep-learning project
- Options for managing the compute infrastructure when running at scale

Specificaties

ISBN13:9781098145286
Trefwoorden:machine learning
Taal:Engels
Bindwijze:paperback
Aantal pagina's:400
Uitgever:O'Reilly
Druk:1
Verschijningsdatum:31-5-2024
Hoofdrubriek:IT-management / ICT
ISSN:

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Inhoudsopgave

Preface
Why Scaling Matters
Who This Book Is For
How This Book Is Organized
Introduction
Part I: Foundational Concepts of Deep Learning
Part II: Distributed Training
Part III: Extreme Scaling
What You Need to Use This Book
Setting Up Your Environment for Hands-on Exercises
Using Code Examples
Conventions Used in This Book
O’Reilly Online Learning
How to Contact Us
Acknowledgments

1. What Nature and History Have Taught Us About Scale
The Philosophy of Scaling
The General Law of Scaling
History of Scaling Law
Scalable Systems
Nature as a Scalable System
Our Visual System: A Biological Inspiration
Artificial Intelligence: The Evolution of Learnable Systems
It Takes Four to Tango
Evolving Deep Learning Trends
Scale in the Context of Deep Learning
Six Development Considerations
Scaling Considerations
Summary

I. Foundational Concepts of Deep Learning
2. Deep Learning
The Role of Data in Deep Learning
Data Flow in Deep Learning
Hands-On Exercise #1: Implementing Minimalistic Deep Learning
Developing the Model
The Embedded/Latent Space
A Word of Caution
The Learning Rate and Loss Landscape
Scaling Consideration
Profiling
Hands-On Exercise #2: Getting Complex with PyTorch
Model Input Data and Pipeline
Model
Auxiliary Utilities
Putting It All Together
Computation Graphs
Inference
Summary

3. The Computational Side of Deep Learning
The Higgs Boson of the Digital World
Floating-Point Numbers: The Faux Continuous Numbers
Units of Data Measurement
Data Storage Formats: The Trade-off of Latency and Throughput
Computer Architecture
The Birth of the Electromechanical Engine
Memory and Persistence
Computation and Memory Combined
The Scaling Laws of Electronics
Scaling Out Computation with Parallelization
Threads Versus Processes: The Unit of Parallelization
Hardware-Optimized Libraries for Acceleration
Parallel Computer Architectures: Flynn’s and Duncan’s Taxonomies
Accelerated Computing
Popular Accelerated Devices for Deep Learning
CUDA
Accelerator Benchmarking
Summary

4. Putting It All Together: Efficient Deep Learning
Hands-On Exercise #1: GPT-2
Exercise Objectives
Model Architecture
Implementation
Running the Example
Experiment Tracking
Measuring to Understand the Limitations and Scale Out
Transitioning from Language to Vision
Hands-On Exercise #2: Vision Model with Convolution
Model Architecture
Running the Example
Observations
Graph Compilation Using PyTorch 2.0
New Components of PyTorch 2.0
Graph Execution in PyTorch 2.0
Modeling Techniques to Scale Training on a Single Device
Graph Compilation
Reduced- and Mixed-Precision Training
Memory Tricks for Efficiency
Optimizer Efficiencies
Model Input Pipeline Tricks
Writing Custom Kernels in PyTorch 2.0 with Triton
Summary

II. Distributed Training
5. Distributed Systems and Communications
Distributed Systems
The Eight Fallacies of Distributed Computing
The Consistency, Availability, and Partition Tolerance (CAP) Theorem
The Scaling Law of Distributed Systems
Types of Distributed Systems
Communication in Distributed Systems
Communication Paradigm
Communication Patterns
Communication Technologies
MPI
Communication Initialization: Rendezvous
Hands-On Exercise
Scaling Compute Capacity
Infrastructure Setup Options
Provisioning of Accelerated Devices
Workload Management
Deep Learning Infrastructure Review
Overview of Leading Deep Learning Clusters
Similarities Between Today’s Most Powerful Systems
Summary

6. Theoretical Foundations of Distributed Deep Learning
Distributed Deep Learning
Centralized DDL
Decentralized DDL
Dimensions of Scaling Distributed Deep Learning
Partitioning Dimensions of Distributed Deep Learning
Types of Distributed Deep Learning Techniques
Choosing a Scaling Technique
Measuring Scale
End-to-End Metrics and Benchmarks
Measuring Incrementally in a Reproducible Environment
Summary

7. Data Parallelism
Data Partitioning
Implications of Data Sampling Strategies
Working with Remote Datasets
Introduction to Data Parallel Techniques
Hands-On Exercise #1: Centralized Parameter Server Using RCP
Hands-On Exercise #2: Centralized Gradient-Partitioned Joint Worker/Server Distributed Training
Hands-On Exercise #3: Decentralized Asynchronous Distributed Training
Centralized Synchronous Data Parallel Strategies
Data Parallel (DP)
Distributed Data Parallel (DDP)
Zero Redundancy Optimizer–Powered Data Parallelism (ZeRO-DP)
Fault-Tolerant Training
Hands-On Exercise #4: Scene Parsing with DDP
Hands-On Exercise #5: Distributed Sharded DDP (ZeRO)
Building Efficient Pipelines
Dataset Format
Local Versus Remote
Staging
Threads Versus Processes: Scaling Your Pipelines
Memory Tricks
Data Augmentations: CPU Versus GPU
JIT Acceleration
Hands-On Exercise #6: Pipeline Efficiency with FFCV
Summary

8. Scaling Beyond Data Parallelism: Model, Pipeline, Tensor, and Hybrid Parallelism
Questions to Ask Before Scaling Vertically
Theoretical Foundations of Vertical Scaling
Revisiting the Dimensions of Scaling
Operators’ Perspective of Parallelism Dimensions
Data Flow and Communications in Vertical Scaling
Basic Building Blocks for Scaling Beyond DP
PyTorch Primitives for Vertical Scaling
Working with Larger Models
Distributed Checkpointing: Saving the Partitioned Model
Summary

9. Gaining Practical Expertise with Scaling Across All Dimensions
Hands-On Exercises: Model, Tensor, Pipeline, and Hybrid Parallelism
The Dataset
Hands-On Exercise #1: Baseline DeepFM
Hands-On Exercise #2: Model Parallel DeepFM
Hands-On Exercise #3: Pipeline Parallel DeepFM
Hands-On Exercise #4: Pipeline Parallel DeepFM with RPC
Hands-On Exercise #5: Tensor Parallel DeepFM
Hands-On Exercise #6: Hybrid Parallel DeepFM
Tools and Libraries for Vertical Scaling
OneFlow
FairScale
DeepSpeed
FSDP
Overview and Comparison
Hands-On Exercise #7: Automatic Vertical Scaling with DeepSpeed
Observations
Summary

III. Extreme Scaling
10. Data-Centric Scaling
The Seven Vs of Data Through a Deep Learning Lens
The Scaling Law of Data
Data Quality
Validity
Variety
Veracity
Value and Volume
The Data Engine and Continual Learning
Volatility
Velocity
Summary

11. Scaling Experiments: Effective Planning and Management
Model Development Is Iterative
Planning for Experiments and Execution
Simplify the Complex
Fast Iteration for Fast Feedback
Decoupled Iterations
Feasibility Testing
Developing and Scaling a Minimal Viable Solution
Setting Up for Iterative Execution
Techniques to Scale Your Experiments
Accelerating Model Convergence
Accelerating Learning Via Optimization and Automation
Accelerating Learning by Increasing Expertise
Learning with Scarce Supervision
Hands-On Exercises
Hands-On Exercise #1: Transfer Learning
Hands-On Exercise #2: Hyperparameter Optimization
Hands-On Exercise #3: Knowledge Distillation
Hands-On Exercise #4: Mixture of Experts
Hands-On Exercise #5: Contrastive Learning
Hands-On Exercise #6: Meta-Learning
Summary

12. Efficient Fine-Tuning of Large Models
Review of Fine-Tuning Techniques
Standard Fine Tuning
Meta-Learning (Zero-/Few-Shot Learning)
Adapter-Based Fine Tuning
Low-Rank Tuning
LoRA—Parameter-Efficient Fine Tuning
Quantized LoRA (QLoRA)
Hands-on Exercise: QLoRA-Based Fine Tuning
Implementation Details
Inference
Exercise Summary
Summary

13. Foundation Models
What Are Foundation Models?
The Evolution of Foundation Models
Challenges Involved in Developing Foundation Models
Measurement Complexity
Deployment Challenges
Propagation of Defects to All Downstream Models
Legal and Ethical Considerations
Ensuring Consistency and Coherency
Multimodal Large Language Models
Projection
Gated Cross-Attention
Query-Based Encoding
Further Exploration
Summary

Index
About the Author

Managementboek Top 100

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Deep Learning at Scale