Javier is a software engineer with experience in technologies ranging from desktop, web, mobile and backend, to augmented reality and deep learning applications.

Meer over de auteurs

Javier Luraschi, Kevin Kuo, Edgar Ruiz

Mastering Spark with R

Name: Mastering Spark with R
Author: Javier Luraschi

The Complete Guide to Large-Scale Analysis and Modeling

Paperback Engels 2019 1e druk 9781492046370

€ 72,13

In winkelwagen

Levertijd ongeveer 16 werkdagen

Gratis verzonden

Samenvatting

If you’re like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems.

Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users.

- Analyze, explore, transform, and visualize data in Apache Spark with R
- Create statistical models to extract information and predict outcomes; automate the process in production-ready workflows
-Perform analysis and modeling across many machines using distributed computing techniques
- Use large-scale data from multiple sources and different formats with ease from within Spark
- Learn about alternative modeling frameworks for graph processing, geospatial analysis, and genomics at scale
- Dive into advanced topics including custom transformations, real-time data processing, and creating custom Spark extensions

Specificaties

ISBN13:9781492046370

Trefwoorden:Programmeren, Data analyse, R, Spark, Data Applications

Taal:Engels

Bindwijze:paperback

Aantal pagina's:269

Uitgever:O'Reilly

Druk:1

Verschijningsdatum:24-10-2019

Hoofdrubriek:IT-management / ICT

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Schrijf een recensie

Uw waardering

?

Log in om uw waardering te geven

Klik om uw waardering te geven

Over Javier Luraschi

Javier is a software engineer with experience in technologies ranging from desktop, web, mobile and backend, to augmented reality and deep learning applications. He previously worked for Microsoft Research and SAP and holds a double degree in Mathematics and Software Engineering. He is the author of various R packages like sparklyr, cloudml, r2d3, mlflow, tfdeploy and kerasjs.

Andere boeken door Javier Luraschi

Bekijk alle boeken

Over Kevin Kuo

Kevin builds open source libraries for machine learning and model deployment. He has held data science positions in various industries including insurance where he was a credentialed actuary. Kevin is the creator of mlflow, mleap, sparkxgb among various R packages. He is also an amateur mixologist and sommelier.

Andere boeken door Kevin Kuo

Bekijk alle boeken

Over Edgar Ruiz

Edgar Ruiz is a solutions engineer at RStudio with a background in deploying enterprise reporting and business intelligence solutions. He is the author of multiple articles and blog posts sharing analytics insights and server infrastructure for data science. Edgar is the author and administrator of the https://db.rstudio.com web site, and current administrator of the sparklyr web site: https://spark.rstudio.com. Co-author of the dbplyr package, and creator of the dbplot, tidypredict and modeldb package.

Andere boeken door Edgar Ruiz

Bekijk alle boeken

Inhoudsopgave

Foreword
Preface
Formatting
Acknowledgments
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us

1. Introduction
Overview
Hadoop
Spark
R
sparklyr
Recap

2. Getting Started
Overview
Prerequisites
Installing sparklyr
Installing Spark
Connecting
Using Spark
Web Interface
Analysis
Modeling
Data
Extensions
Distributed R
Streaming
Logs
Disconnecting
Using RStudio
Resources
Recap

3. Analysis
Overview
Import
Wrangle
Built-in Functions
Correlations
Visualize
Using ggplot2
Using dbplot
Model
Caching
Communicate
Recap

4. Modeling
Overview
Exploratory Data Analysis
Feature Engineering
Supervised Learning
Generalized Linear Regression
Other Models
Unsupervised Learning
Data Preparation
Topic Modeling
Recap

5. Pipelines
Overview
Creation
Use Cases
Hyperparameter Tuning
Operating Modes
Interoperability
Deployment
Batch Scoring
Real-Time Scoring
Recap

6. Clusters
Overview
On-Premises
Managers
Distributions
Cloud
Amazon
Databricks
Google
IBM
Microsoft
Qubole
Kubernetes
Tools
RStudio
Jupyter
Livy
Recap

7. Connections
Overview
Edge Nodes
Spark Home
Local
Standalone
YARN
YARN Client
YARN Cluster
Livy
Mesos
Kubernetes
Cloud
Batches
Tools
Multiple Connections
Troubleshooting
Logging
Spark Submit
Windows
Recap

8. Data
Overview
Reading Data
Paths
Schema
Memory
Columns
Writing Data
Copying Data
File Formats
CSV
JSON
Parquet
Others
File Systems
Storage Systems
Hive
Cassandra
JDBC
Recap

9. Tuning
Overview
Graph
Timeline
Configuring
Connect Settings
Submit Settings
Runtime Settings
sparklyr Settings
Partitioning
Implicit Partitions
Explicit Partitions
Caching
Checkpointing
Memory
Shuffling
Serialization
Configuration Files
Recap

10. Extensions
Overview
H2O
Graphs
XGBoost
Deep Learning
Genomics
Spatial
Troubleshooting
Recap

11. Distributed R
Overview
Use Cases
Custom Parsers
Partitioned Modeling
Grid Search
Web APIs
Simulations
Partitions
Grouping
Columns
Context
Functions
Packages
Cluster Requirements
Installing R
Apache Arrow
Troubleshooting
Worker Logs
Resolving Timeouts
Inspecting Partitions
Debugging Workers
Recap

12. Streaming
Overview
Transformations
Analysis
Modeling
Pipelines
Distributed R
Kafka
Shiny
Recap

13. Contributing
Overview
The Spark API
Spark Extensions
Using Scala Code
Recap

A. Supplemental Code References
Preface
Formatting
Chapter 1
The World’s Capacity to Store Information
Daily Downloads of CRAN Packages
Chapter 2
Prerequisites
Chapter 3
Hive Functions
Chapter 4
MLlib Functions
Chapter 6
Google Trends for On-Premises (Mainframes), Cloud Computing, and Kubernetes
Chapter 12
Stream Generator
Installing Kafka

Index

Aanbevolen live events

woensdag 30-09-2026

Jaarcongres Vrouwen met Impact

Seminar

Anderen die dit boek kochten, kochten ook

Gene Kim

The Phoenix Project

€ 26,74
The Open Group

ArchiMate® 3.2 Specification

€ 49,00
Ben Groenendijk

Aan de slag met Excel 365-2024

€ 41,95
Leon Tindemans

Copilot 365 basisboek

€ 29,50
Antal de Waij

Handboek AI-geletterdheid

€ 29,99
Sue Blumenberg

Python for Everybody

€ 15,27

Managementboek Top 100

Bekijk de volledige Managementboek Top 100

Uw winkelwagen

Mastering Spark with R

The Complete Guide to Large-Scale Analysis and Modeling

Samenvatting

Specificaties

Lezersrecensies