

Matt Harrison is a Python user, presenter, author, and user group organizer. He helps run the Utah Python user group.
Meer over Matt HarrisonMachine Learning Pocket Reference
Working with Structured Data in Python
Paperback Engels 2019 1e druk 9781492047544Samenvatting
With detailed notes, tables, and examples, this handy reference will help you navigate the basics of structured machine learning. Author Matt Harrison delivers a valuable guide that you can use for additional support during training and as a convenient resource when you dive into your next machine learning project.
Ideal for programmers, data scientists, and AI engineers, this book includes an overview of the machine learning process and walks you through classification with structured data. You’ll also learn methods for clustering, predicting a continuous value (regression), and reducing dimensionality, among other topics.
This pocket reference includes sections that cover:
- Classification, using the Titanic dataset
- Cleaning data and dealing with missing data
- Exploratory data analysis
- Common preprocessing steps using sample data
- Selecting features useful to the model
- Model selection
- Metrics and classification evaluation
- Regression examples using k-nearest neighbor, decision trees, boosting, and more
- Metrics for regression evaluation
- Clustering
- Dimensionality reduction
- Scikit-learn pipelines
Specificaties
Lezersrecensies
Inhoudsopgave
What to Expect
Who This Book Is For
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
1. Introduction
Libraries Used
Installation with Pip
Installation with Conda
2. Overview of the Machine Learning Process
3. Classification Walkthrough: Titanic Dataset
Project Layout Suggestion
Imports
Ask a Question
Terms for Data
Gather Data
Clean Data
Create Features
Sample Data
Impute Data
Normalize Data
Refactor
Baseline Model
Various Families
Stacking
Create Model
Evaluate Model
Optimize Model
Confusion Matrix
ROC Curve
Learning Curve
Deploy Model
4. Missing Data
Examining Missing Data
Dropping Missing Data
Imputing Data
Adding Indicator Columns
5. Cleaning Data
Column Names
Replacing Missing Values
6. Exploring
Data Size
Summary Stats
Histogram
Scatter Plot
Joint Plot
Pair Grid
Box and Violin Plots
Comparing Two Ordinal Values
Correlation
RadViz
Parallel Coordinates
7. Preprocess Data
Standardize
Scale to Range
Dummy Variables
Label Encoder
Frequency Encoding
Pulling Categories from Strings
Other Categorical Encoding
Date Feature Engineering
Add col_na Feature
Manual Feature Engineering
8. Feature Selection
Collinear Columns
Lasso Regression
Recursive Feature Elimination
Mutual Information
Principal Component Analysis
Feature Importance
9. Imbalanced Classes
Use a Different Metric
Tree-based Algorithms and Ensembles
Penalize Models
Upsampling Minority
Generate Minority Data
Downsampling Majority
Upsampling Then Downsampling
10. Classification
Logistic Regression
Naive Bayes
Support Vector Machine
K-Nearest Neighbor
Decision Tree
Random Forest
XGBoost
Gradient Boosted with LightGBM
TPOT
11. Model Selection
Validation Curve
Learning Curve
12. Metrics and Classification Evaluation
Confusion Matrix
Metrics
Accuracy
Recall
Precision
F1
Classification Report
ROC
Precision-Recall Curve
Cumulative Gains Plot
Lift Curve
Class Balance
Class Prediction Error
Discrimination Threshold
13. Explaining Models
Regression Coefficients
Feature Importance
LIME
Tree Interpretation
Partial Dependence Plots
Surrogate Models
Shapley
14. Regression
Baseline Model
Linear Regression
SVMs
K-Nearest Neighbor
Decision Tree
Random Forest
XGBoost Regression
LightGBM Regression
15. Metrics and Regression Evaluation
Metrics
Residuals Plot
Heteroscedasticity
Normal Residuals
Prediction Error Plot
16. Explaining Regression Models
Shapley
17. Dimensionality Reduction
PCA
UMAP
t-SNE
PHATE
18. Clustering
K-Means
Agglomerative (Hierarchical) Clustering
Understanding Clusters
19. Pipelines
Classification Pipeline
Regression Pipeline
PCA Pipeline
Index
Rubrieken
- advisering
- algemeen management
- coaching en trainen
- communicatie en media
- economie
- financieel management
- inkoop en logistiek
- internet en social media
- it-management / ict
- juridisch
- leiderschap
- marketing
- mens en maatschappij
- non-profit
- ondernemen
- organisatiekunde
- personal finance
- personeelsmanagement
- persoonlijke effectiviteit
- projectmanagement
- psychologie
- reclame en verkoop
- strategisch management
- verandermanagement
- werk en loopbaan