Data Munging with R
Paperback Engels 2019 1e druk 9781617294594Samenvatting
'Data Munging with R' shows readers how to take raw data and transform it for use in computations, tables, graphs, and more. Whether they already have some programming experience or they're just a spreadsheet whiz looking for a more powerful data manipulation tool, this book will help programmers get started.
Readers will discover the ins and outs of using the data-oriented R programming language and its many task-specific packages. By the end, readers will be master mungers, with a robust, reproducible workflow and the skills to use data to strengthen their conclusions!
Key Features
- Practical examples
- Step-by-step guide
- Introduction to R
If you have beginner programming skills or you're comfortable with writing spreadsheet formulas, you have everything you need to get the most out of this book.
R is a statistical programming language in that it was made for the purpose of performing statistics calculations, but it has grown to be so much more through community contributions. As a general purpose language, it is flexible enough to work with almost any data you can interact with; stored or streaming, images, text, or numbers.
Specificaties
Lezersrecensies
Inhoudsopgave
1.1 Data: What, Where, How?
1.1.1 What is Data?
1.1.2 Seeing the World as Data Sources
1.1.3 Data Munging?
1.1.4 What You Can Do With Well-Handled Data
1.1.5 Data as an Asset
1.1.6 Reproducible Research and Version Control
1.2 Introducing R
1.2.1 The Origins of R
1.2.2 What It Is and What It Isn’t
1.3 How R Works
1.4 Introducing RStudio
1.4.1 Working with R within RStudio
1.4.2 Built-in Packages (Data and Functions)
1.5 In-built Documentation
1.5.1 Vignettes
1.6 Try It Yourself
Terminology
Summary
2 GETTING TO KNOW R DATA TYPES
2.1 Types of Data
2.1.1 Numbers
2.1.2 Text (Strings)
2.1.3 Categories (Factors)
2.1.4 Dates and Times
2.1.5 Logicals
2.1.6 Missing Values
2.2 Storing Values (Assigning)
2.2.1 Naming Data (Variables)
2.2.2 Unchanging Data
2.2.3 The Assignment Operators (<- vs =)
2.3 Specifying the Data Type
2.4 Telling R to Ignore Something
2.5 Try It Yourself
Terminology
Summary
3 I WANT TO MAKE NEW DATA VALUES
3.1 Basic Mathematics
3.2 Operator Precedence
3.3 String Concatenation (Joining)
3.4 Comparisons
3.5 Automatic Conversion (Coercion)
3.6 Try It Yourself
Terminology
Summary
4 UNDERSTANDING THE TOOLS WE’LL USE: FUNCTIONS
4.1 Functions
4.1.1 Under the Hood
4.1.2 Function Template
4.1.3 Arguments
4.1.4 Multiple Arguments
4.1.5 Default Arguments
4.1.6 Argument Name Matching
4.1.7 Partial Matching
4.1.8 Scope
4.2 Packages
4.2.1 How Does R (Not) Know About This Function?
4.3 Messages, Warnings, and Errors, Oh My!
4.3.1 Creating Messages, Warnings, and Errors
4.3.2 How To Diagnose Them
4.4 Testing
4.5 Project: Generalizing a function
4.6 Try It Yourself
Terminology
Summary
5 COMBINING DATA VALUES
5.1 Simple Collections
5.1.1 Coercion
5.1.2 Missing Values
5.1.3 Attributes
5.1.4 Names
5.2 Sequences
5.2.1 Vector Functions
5.2.2 Vector Math Operations
5.3 Matrices
5.3.1 Indexing
5.4 Lists
5.5 data.frames
5.6 Classes
5.6.1 The tibble class
5.6.2 Structures as Function Arguments
5.7 Try It Yourself
Terminology
Summary
6 SELECTING DATA VALUES
6.1 Text Processing
6.1.1 Text Matching
6.1.2 Substrings
6.1.3 Text Substitutions
6.1.4 Regular Expressions
6.2 Selecting Components from Structures
6.2.1 Vectors
6.2.2 Lists
6.2.3 Matrices
6.3 Replacing Values
6.4 data.frames and dplyr
6.4.1 dplyr Verbs
6.4.2 Non-Standard Evaluation
6.4.3 Pipes
6.4.4 Subsetting data.frame The Hard Way
6.5 Replacing NA
6.6 Selecting Conditionally
6.7 Summarising Values
6.8 A Worked Example: Excel vs R
6.9 Try It Yourself
6.9.1 Solutions—no peeking
Terminology
Summary
7 DOING THINGS WITH LOTS OF DATA
7.1 Tidy Data Principles
7.1.1 The Working Directory
7.1.2 Stored Data Formats
7.1.3 Reading Data into R
7.1.4 Scraping Data
7.1.5 Inspecting Data
7.1.6 I Have Odd Values In My Data (Sentinel Values)
7.1.7 Converting to Tidy Data
7.2 Merging Data
7.3 Writing Data From R
7.4 Try It Yourself
Terminology
Summary
8 DOING THINGS CONDITIONALLY: CONTROL STRUCTURES
8.1 Looping
8.1.1 Vectorisation
8.1.2 Tidy repetition: Looping with purrr
8.1.3 for loops
8.2 Wider and Narrower Loop Scope
8.2.1 while loops
8.3 Conditional evaluation
8.3.1 if conditions
8.3.2 ifelse conditions
8.4 Try It Yourself
Terminology
Summary
9 VISUALIZING DATA: PLOTTING
9.1 Data Preparation
9.1.1 Tidy Data, Revisited
9.1.2 Importance of Data Types
9.2 ggplot2
9.2.1 General construction
9.2.2 Adding points
9.2.3 Style aesthetics
9.2.4 Adding lines
9.2.5 Adding bars
9.2.6 Other types of plots
9.2.7 Scales
9.2.8 Facetting
9.2.9 Additional options
9.3 Plots as Objects
9.4 Saving plots
9.5 Try It Yourself
Terminology
Summary
10 DOING MORE WITH YOUR DATA WITH EXTENSIONS
10.1 Writing Your Own Packages
10.1.1 Creating a Minimal Package
10.1.2 Documentation
10.2 Analysing Your Package
10.2.1 Unit Testing
10.2.2 Profiling
10.3 What To Do Next?
10.3.1 Regression
10.3.2 Clustering
10.3.3 Working With Maps
10.3.4 Interacting With APIs
10.3.5 Sharing Your Package
10.4 More Resources
Terminology
Summary
APPENDIXES
APPENDIX A: INSTALLING R
Windows
Mac
Linux
From source
APPENDIX B: INSTALLING RSTUDIO
Installing RStudio
Packages used in this book
APPENDIX C: GRAPHICS IN BASE R
Rubrieken
- advisering
- algemeen management
- coaching en trainen
- communicatie en media
- economie
- financieel management
- inkoop en logistiek
- internet en social media
- it-management / ict
- juridisch
- leiderschap
- marketing
- mens en maatschappij
- non-profit
- ondernemen
- organisatiekunde
- personal finance
- personeelsmanagement
- persoonlijke effectiviteit
- projectmanagement
- psychologie
- reclame en verkoop
- strategisch management
- verandermanagement
- werk en loopbaan