

Chris Albon is data scientist with a Ph.D. in quantitative political science and a decade of experience working in statistical learning, artificial intelligence, and software engineering.
Meer over Chris AlbonMachine Learning with Python Cookbook
Practical Solutions from Preprocessing to Deep Learning
Paperback Engels 2018 1e druk 9781491989388Samenvatting
This practical guide provides nearly 200 self-contained recipes to help you solve machine learning challenges you may encounter in your daily work. If you’re comfortable with Python and its libraries, including pandas and scikit-learn, you’ll be able to address specific problems such as loading data, handling text or numerical data, model selection, and dimensionality reduction and many other topics.
Each recipe includes code that you can copy and paste into a toy dataset to ensure that it actually works. From there, you can insert, combine, or adapt the code to help construct your application. Recipes also include a discussion that explains the solution and provides meaningful context. This cookbook takes you beyond theory and concepts by providing the nuts and bolts you need to construct working machine learning applications.
You’ll find recipes for:
- Vectors, matrices, and arrays
- Handling numerical and categorical data, text, images, and dates and times
- Dimensionality reduction using feature extraction or feature selection
- Model evaluation and selection
- Linear and logical regression, trees and forests, and k-nearest neighbors
- Support vector machines (SVM), naïve Bayes, clustering, and neural networks
- Saving and loading trained models
Specificaties
Lezersrecensies
Inhoudsopgave
Who This Book Is For
Who This Book Is Not For
Terminology Used in This Book
Acknowledgments
Vectors, Matrices, and Arrays
1.0. Introduction
1.1. Creating a Vector
1.2. Creating a Matrix
1.3. Creating a Sparse Matrix
1.4. Selecting Elements
1.5. Describing a Matrix
1.6. Applying Operations to Elements
1.7. Finding the Maximum and Minimum Values
1.8. Calculating the Average, Variance, and Standard Deviation
1.9. Reshaping Arrays
1.10. Transposing a Vector or Matrix
1.11. Flattening a Matrix
1.12. Finding the Rank of a Matrix
1.13. Calculating the Determinant
1.14. Getting the Diagonal of a Matrix
1.15. Calculating the Trace of a Matrix
1.16. Finding Eigenvalues and Eigenvectors
1.17. Calculating Dot Products
1.18. Adding and Subtracting Matrices
1.19. Multiplying Matrices
1.20. Inverting a Matrix
1.21. Generating Random Values
Loading Data
2.0. Introduction
2.1. Loading a Sample Dataset
2.2. Creating a Simulated Dataset
2.3. Loading a CSV File
2.4. Loading an Excel File
2.5. Loading a JSON File
2.6. Querying a SQL Database
Data Wrangling
3.0. Introduction
3.1. Creating a Data Frame
3.2. Describing the Data
3.3. Navigating DataFrames
3.4. Selecting Rows Based on Conditionals
3.5. Replacing Values
3.6. Renaming Columns
3.7. Finding the Minimum, Maximum, Sum, Average, and Count
3.8. Finding Unique Values
3.9. Handling Missing Values
3.10. Deleting a Column
3.11. Deleting a Row
3.12. Dropping Duplicate Rows
3.13. Grouping Rows by Values
3.14. Grouping Rows by Time
3.15. Looping Over a Column
3.16. Applying a Function Over All Elements in a Column
3.17. Applying a Function to Groups
3.18. Concatenating DataFrames
3.19. Merging DataFrames
Handling Numerical Data
4.0. Introduction
4.1. Rescaling a Feature
4.2. Standardizing a Feature
4.3. Normalizing Observations
4.4. Generating Polynomial and Interaction Features
4.5. Transforming Features
4.6. Detecting Outliers
4.7. Handling Outliers
4.8. Discretizating Features
4.9. Grouping Observations Using Clustering
4.10. Deleting Observations with Missing Values
4.11. Imputing Missing Values
Handling Categorical Data
5.0. Introduction
5.1. Encoding Nominal Categorical Features
5.2. Encoding Ordinal Categorical Features
5.3. Encoding Dictionaries of Features
5.4. Imputing Missing Class Values
5.5. Handling Imbalanced Classes
Handling Text
6.0. Introduction
6.1. Cleaning Text
6.2. Parsing and Cleaning HTML
6.3. Removing Punctuation
6.4. Tokenizing Text
6.5. Removing Stop Words
6.6. Stemming Words
6.7. Tagging Parts of Speech
6.8. Encoding Text as a Bag of Words
6.9. Weighting Word Importance
Handling Dates and Times
7.0. Introduction
7.1. Converting Strings to Dates
7.2. Handling Time Zones
7.3. Selecting Dates and Times
7.4. Breaking Up Date Data into Multiple Features
7.5. Calculating the Difference Between Dates
7.6. Encoding Days of the Week
7.7. Creating a Lagged Feature
7.8. Using Rolling Time Windows
7.9. Handling Missing Data in Time Series
Handling Images
8.0. Introduction
8.1. Loading Images
8.2. Saving Images
8.3. Resizing Images
8.4. Cropping Images
8.5. Blurring Images
8.6. Sharpening Images
8.7. Enhancing Contrast
8.8. Isolating Colors
8.9. Binarizing Images
8.10. Removing Backgrounds
8.11. Detecting Edges
8.12. Detecting Corners
8.13. Creating Features for Machine Learning
8.14. Encoding Mean Color as a Feature
8.15. Encoding Color Histograms as Features
Dimensionality Reduction Using Feature Extraction
9.0. Introduction
9.1. Reducing Features Using Principal Components
9.2. Reducing Features When Data Is Linearly Inseparable
9.3. Reducing Features by Maximizing Class Separability
9.4. Reducing Features Using Matrix Factorization
9.5. Reducing Features on Sparse Data
Dimensionality Reduction Using Feature Selection
10.0. Introduction
10.1. Thresholding Numerical Feature Variance
10.2. Thresholding Binary Feature Variance
10.3. Handling Highly Correlated Features
10.4. Removing Irrelevant Features for Classification
10.5. Recursively Eliminating Features
Model Evaluation
11.0. Introduction
11.1. Cross-Validating Models
11.2. Creating a Baseline Regression Model
11.3. Creating a Baseline Classification Model
11.4. Evaluating Binary Classifier Predictions
11.5. Evaluating Binary Classifier Thresholds
11.6. Evaluating Multiclass Classifier Predictions
11.7. Visualizing a Classifier’s Performance
11.8. Evaluating Regression Models
11.9. Evaluating Clustering Models
11.10. Creating a Custom Evaluation Metric
11.11. Visualizing the Effect of Training Set Size
11.12. Creating a Text Report of Evaluation Metrics
11.13. Visualizing the Effect of Hyperparameter Values
Model Selection
12.0. Introduction
12.1. Selecting Best Models Using Exhaustive Search
12.2. Selecting Best Models Using Randomized Search
12.3. Selecting Best Models from Multiple Learning Algorithms
12.4. Selecting Best Models When Preprocessing
12.5. Speeding Up Model Selection with Parallelization
12.6. Speeding Up Model Selection Using Algorithm-Specific Methods
12.7. Evaluating Performance After Model Selection
Linear Regression
13.0. Introduction
13.1. Fitting a Line
13.2. Handling Interactive Effects
13.3. Fitting a Nonlinear Relationship
13.4. Reducing Variance with Regularization
13.5. Reducing Features with Lasso Regression
Trees and Forests
14.0. Introduction
14.1. Training a Decision Tree Classifier
14.2. Training a Decision Tree Regressor
14.3. Visualizing a Decision Tree Model
14.4. Training a Random Forest Classifier
14.5. Training a Random Forest Regressor
14.6. Identifying Important Features in Random Forests
14.7. Selecting Important Features in Random Forests
14.8. Handling Imbalanced Classes
14.9. Controlling Tree Size
14.10. Improving Performance Through Boosting
14.11. Evaluating Random Forests with Out-of-Bag Errors
K-Nearest Neighbors
15.0. Introduction
15.1. Finding an Observation’s Nearest Neighbors
15.2. Creating a K-Nearest Neighbor Classifier
15.3. Identifying the Best Neighborhood Size
15.4. Creating a Radius-Based Nearest Neighbor Classifier
Logistic Regression
16.0. Introduction
16.1. Training a Binary Classifier
16.2. Training a Multiclass Classifier
16.3. Reducing Variance Through Regularization
16.4. Training a Classifier on Very Large Data
16.5. Handling Imbalanced Classes
Support Vector Machines
17.0. Introduction
17.1. Training a Linear Classifier
17.2. Handling Linearly Inseparable Classes Using Kernels
17.3. Creating Predicted Probabilities
17.4. Identifying Support Vectors
17.5. Handling Imbalanced Classes
Naive Bayes
18.0. Introduction
18.1. Training a Classifier for Continuous Features
18.2. Training a Classifier for Discrete and Count Features
18.3. Training a Naive Bayes Classifier for Binary Features
18.4. Calibrating Predicted Probabilities
Clustering
19.0. Introduction
19.1. Clustering Using K-Means
19.2. Speeding Up K-Means Clustering
19.3. Clustering Using Meanshift
19.4. Clustering Using DBSCAN
19.5. Clustering Using Hierarchical Merging
Neural Networks
20.0. Introduction
20.1. Preprocessing Data for Neural Networks
20.2. Designing a Neural Network
20.3. Training a Binary Classifier
20.4. Training a Multiclass Classifier
20.5. Training a Regressor
20.6. Making Predictions
20.7. Visualize Training History
20.8. Reducing Overfitting with Weight Regularization
20.9. Reducing Overfitting with Early Stopping
20.10. Reducing Overfitting with Dropout
20.11. Saving Model Training Progress
20.12. k-Fold Cross-Validating Neural Networks
20.13. Tuning Neural Networks
20.14. Visualizing Neural Networks
20.15. Classifying Images
20.16. Improving Performance with Image Augmentation
20.17. Classifying Text
Saving and Loading Trained Models
21.0. Introduction
21.1. Saving and Loading a scikit-learn Model
21.2. Saving and Loading a Keras Model
Index
Rubrieken
- advisering
- algemeen management
- coaching en trainen
- communicatie en media
- economie
- financieel management
- inkoop en logistiek
- internet en social media
- it-management / ict
- juridisch
- leiderschap
- marketing
- mens en maatschappij
- non-profit
- ondernemen
- organisatiekunde
- personal finance
- personeelsmanagement
- persoonlijke effectiviteit
- projectmanagement
- psychologie
- reclame en verkoop
- strategisch management
- verandermanagement
- werk en loopbaan