Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20
,

Tidy Modeling with R

A Framework for Modeling in the Tidyverse

Paperback Engels 2022 1e druk 9781492096481
Verwachte levertijd ongeveer 16 werkdagen

Samenvatting

Get going with tidymodels, a collection of R packages for modeling and machine learning. Whether you're just starting out or have years of experience with modeling, this practical introduction shows data analysts, business analysts, and data scientists how the tidymodels framework offers a consistent, flexible approach for your work.

RStudio engineers Max Kuhn and Julia Silge demonstrate ways to create models by focusing on an R dialect called the tidyverse. Software that adopts tidyverse principles shares both a high-level design philosophy and low-level grammar and data structures, so learning one piece of the ecosystem makes it easier to learn the next. You'll understand why the tidymodels framework has been built to be used by a broad range of people.

With this book, you will:
- Learn the steps necessary to build a model from beginning to end
- Understand how to use different modeling and feature engineering approaches fluently
- Examine the options for avoiding common pitfalls of modeling, such as overfitting
- Learn practical methods to prepare your data for modeling
- Tune models for optimal performance
- Use good statistical practices to compare, evaluate, and choose among models

Specificaties

ISBN13:9781492096481
Trefwoorden:Programmeren, R
Taal:Engels
Bindwijze:paperback
Aantal pagina's:300
Uitgever:O'Reilly
Druk:1
Verschijningsdatum:26-7-2022
Hoofdrubriek:IT-management / ICT

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Over Julia Silge

Julia Silge is a data scientist at Stack Overflow. She enjoys making beautiful charts, the statistical programming language R, black coffee, red wine, and the mountains of her adopted home here in Utah. She has a PhD in astrophysics and an abiding love for Jane Austen. Her work involves analyzing and modeling complex data sets while communicating about technical topics with diverse audiences.

Andere boeken door Julia Silge

Inhoudsopgave

Preface
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments

I. Introduction
1. Software for Modeling
Fundamentals for Modeling Software
Types of Models
Descriptive Models
Inferential Models
Predictive Models
Connections Between Types of Models
Some Terminology
How Does Modeling Fit into the Data Analysis Process?
Chapter Summary

2. A Tidyverse Primer
Tidyverse Principles
Design for Humans
Reuse Existing Data Structures
Design for the Pipe and Functional Programming
Examples of Tidyverse Syntax
Chapter Summary

3. A Review of R Modeling Fundamentals
An Example
What Does the R Formula Do?
Why Tidiness Is Important for Modeling
Combining Base R Models and the Tidyverse
The tidymodels Metapackage
Chapter Summary

II. Modeling Basics
4. The Ames Housing Data
Exploring Features of Homes in Ames
Chapter Summary

5. Spending Our Data
Common Methods for Splitting Data
What About a Validation Set?
Multilevel Data
Other Considerations for a Data Budget
Chapter Summary

6. Fitting Models with parsnip
Create a Model
Use the Model Results
Make Predictions
parsnip-Extension Packages
Creating Model Specifications
Chapter Summary

7. A Model Workflow
Where Does the Model Begin and End?
Workflow Basics
Adding Raw Variables to the workflow()
How Does a workflow() Use the Formula?
Tree-Based Models
Special Formulas and Inline Functions
Creating Multiple Workflows at Once
Evaluating the Test Set
Chapter Summary

8. Feature Engineering with Recipes
A Simple recipe() for the Ames Housing Data
Using Recipes
How Data Are Used by the recipe()
Examples of Steps
Encoding Qualitative Data in a Numeric Format
Interaction Terms
Spline Functions
Feature Extraction
Row Sampling Steps
General Transformations
Natural Language Processing
Skipping Steps for New Data
Tidy a recipe()
Column Roles
Chapter Summary

9. Judging Model Effectiveness
Performance Metrics and Inference
Regression Metrics
Binary Classification Metrics
Multiclass Classification Metrics
Chapter Summary

III. Tools for Creating Effective Models
10. Resampling for Evaluating Performance
The Resubstitution Approach
Resampling Methods
Cross-Validation
Repeated Cross-Validation
Leave-One-Out Cross-Validation
Monte Carlo Cross-Validation
Validation Sets
Bootstrapping
Rolling Forecasting Origin Resampling
Estimating Performance
Parallel Processing
Saving the Resampled Objects
Chapter Summary

11. Comparing Models with Resampling
Creating Multiple Models with Workflow Sets
Comparing Resampled Performance Statistics
Simple Hypothesis Testing Methods
Bayesian Methods
A Random Intercept Model
The Effect of the Amount of Resampling
Chapter Summary

12. Model Tuning and the Dangers of Overfitting
Model Parameters
Tuning Parameters for Different Types of Models
What Do We Optimize?
The Consequences of Poor Parameter Estimates
Two General Strategies for Optimization
Tuning Parameters in tidymodels
Chapter Summary

13. Grid Search
Regular and Nonregular Grids
Regular Grids
Nonregular Grids
Evaluating the Grid
Finalizing the Model
Tools for Creating Tuning Specifications
Tools for Efficient Grid Search
Submodel Optimization
Parallel Processing
Benchmarking Boosted Trees
Access to Global Variables
Racing Methods
Chapter Summary

14. Iterative Search
A Support Vector Machine Model
Bayesian Optimization
A Gaussian Process Model
Acquisition Functions
The tune_bayes() Function
Simulated Annealing
Simulated Annealing Search Process
The tune_sim_anneal() Function
Chapter Summary

15. Screening Many Models
Modeling Concrete Mixture Strength
Creating the Workflow Set
Tuning and Evaluating the Models
Efficiently Screening Models
Finalizing a Model
Chapter Summary

IV. Beyond the Basics
16. Dimensionality Reduction
What Problems Can Dimensionality Reduction Solve?
A Picture Is Worth a Thousand…Beans
A Starter Recipe
Recipes in the Wild
Preparing a Recipe
Baking the Recipe
Feature Extraction Techniques
Principal Component Analysis
Partial Least Squares
Independent Component Analysis
Uniform Manifold Approximation and Projection
Modeling
Chapter Summary

17. Encoding Categorical Data
Is an Encoding Necessary?
Encoding Ordinal Predictors
Using the Outcome for Encoding Predictors
Effect Encodings in tidymodels
Effect Encodings with Partial Pooling
Feature Hashing
More Encoding Options
Chapter Summary

18. Explaining Models and Predictions
Software for Model Explanations
Local Explanations
Global Explanations
Building Global Explanations from Local Explanations
Back to Beans!
Chapter Summary

19. When Should You Trust Your Predictions?
Equivocal Results
Determining Model Applicability
Chapter Summary

20. Ensembles of Models
Creating the Training Set for Stacking
Blend the Predictions
Fit the Member Models
Test Set Results
Chapter Summary

21. Inferential Analysis
Inference for Count Data
Comparisons with Two-Sample Tests
Log-Linear Models
A More Complex Model
More Inferential Analysis
Chapter Summary
A. Recommended Preprocessing
References

Index
About the Authors

Managementboek Top 100

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Tidy Modeling with R