Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20

Google BigQuery: The Definitive Guide

Data Warehousing, Analytics, and Machine Learning at Scale

Paperback Engels 2019 9781492044468
Verwachte levertijd ongeveer 8 werkdagen


Work with petabyte-scale datasets while building a collaborative, agile workplace in the process. This practical book is the canonical reference to Google BigQuery, the query engine that lets you conduct interactive analysis of large datasets. BigQuery enables enterprises to efficiently store, query, ingest, and learn from their data in a convenient framework. With this book, you’ll examine how to analyze data at scale to derive insights from large datasets efficiently.

Valliappa Lakshmanan, tech lead for Google Cloud Platform, and Jordan Tigani, engineering director for the BigQuery team, provide best practices for modern data warehousing within an autoscaled, serverless public cloud. Whether you want to explore parts of BigQuery you’re not familiar with or prefer to focus on specific tasks, this reference is indispensable.


Aantal pagina's:498


Wees de eerste die een lezersrecensie schrijft!

Geef uw waardering

Zeer goed Goed Voldoende Matig Slecht

Over Valliappa Lakshmanan

Valliappa (Lak) Lakshmanan is currently a Technical Lead for Data and Machine Learning Professional Services for Google Cloud. His mission is to democratize machine learning so that it can be done by anyone anywhere using Google's amazing infrastructure, without deep knowledge of statistics or programming or ownership of a lot of hardware. Before Google, he led a team of data scientists at the Climate Corporation and was a Research Scientist at NOAA National Severe Storms Laboratory, working on machine learning applications for severe weather diagnosis and prediction. http://aisoftwarellc.weebly.com/

Andere boeken door Valliappa Lakshmanan

Over Jordan Tigani

Jordan is engineering director for the core BigQuery team. He was one of the founding engineers on BigQuery, and helped grow it to be one of the most successful products in Google’s Cloud Platform. He wrote the first book on BigQuery, and has also spoken widely on the subject. Jordan has twenty years of software development experience, ranging from Microsoft Research to Machine Learning startups.

Andere boeken door Jordan Tigani


Who Is This Book For?
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us

1. What Is Google BigQuery?
Data Processing Architectures
Relational Database Management System
MapReduce Framework
BigQuery: A Serverless, Distributed SQL Engine
Working with BigQuery
Deriving Insights Across Datasets
ETL, EL, and ELT
Powerful Analytics
Simplicity of Management
How BigQuery Came About
What Makes BigQuery Possible?
Separation of Compute and Storage
Storage and Networking Infrastructure
Managed Storage
Integration with Google Cloud Platform
Security and Compliance

2. Query Essentials
Simple Queries
Retrieving Rows by Using SELECT
Aliasing Column Names with AS
Filtering with WHERE
Subqueries with WITH
Sorting with ORDER BY
Computing Aggregates by Using GROUP BY
Counting Records by Using COUNT
Filtering Grouped Items by Using HAVING
Finding Unique Values by Using DISTINCT
A Brief Primer on Arrays and Structs
Creating Arrays by Using ARRAY_AGG
Array of STRUCT
Working with Arrays
UNNEST an Array
Joining Tables
The JOIN Explained
Saving and Sharing
Query History and Caching
Saved Queries
Views Versus Shared Queries

3. Data Types, Functions, and Operators
Numeric Types and Functions
Mathematical Functions
Standard-Compliant Floating-Point Division
SAFE Functions
Precise Decimal Calculations with NUMERIC
Working with BOOL
Logical Operations
Conditional Expressions
Cleaner NULL-Handling with COALESCE
Casting and Coercion
Using COUNTIF to Avoid Casting Booleans
String Functions
Printing and Parsing
String Manipulation Functions
Transformation Functions
Regular Expressions
Summary of String Functions
Working with TIMESTAMP
Parsing and Formatting Timestamps
Extracting Calendar Parts
Arithmetic with Timestamps
Date, Time, and DateTime
Working with GIS Functions

4. Loading Data into BigQuery
The Basics
Loading from a Local Source
Specifying a Schema
Copying into a New Table
Data Management (DDL and DML)
Loading Data Efficiently
Federated Queries and External Data Sources
How to Use Federated Queries
When to Use Federated Queries and External Data Sources
Interactive Exploration and Querying of Data in Google Sheets
SQL Queries on Data in Cloud Bigtable
Transfers and Exports
Data Transfer Service
Exporting Stackdriver Logs
Using Cloud Dataflow to Read/Write from BigQuery
Moving On-Premises Data
Data Migration Methods

5. Developing with BigQuery
Developing Programmatically
Accessing BigQuery via the REST API
Google Cloud Client Library
Accessing BigQuery from Data Science Tools
Notebooks on Google Cloud Platform
Working with BigQuery, pandas, and Jupyter
Working with BigQuery from R
Cloud Dataflow
JDBC/ODBC drivers
Incorporating BigQuery Data into Google Slides (in G Suite)
Bash Scripting with BigQuery
Creating Datasets and Tables
Executing Queries
BigQuery Objects

6. Architecture of BigQuery
High-Level Architecture
Life of a Query Request
BigQuery Upgrades
Query Engine (Dremel)
Dremel Architecture
Query Execution
Storage Data

7. Optimizing Performance and Cost
Principles of Performance
Key Drivers of Performance
Controlling Cost
Measuring and Troubleshooting
Measuring Query Speed Using REST API
Measuring Query Speed Using BigQuery Workload Tester
Troubleshooting Workloads Using Stackdriver
Reading Query Plan Information
Increasing Query Speed
Minimizing I/O
Caching the Results of Previous Queries
Performing Efficient Joins
Avoiding Overwhelming a Worker
Using Approximate Aggregation Functions
Optimizing How Data Is Stored and Accessed
Minimizing Network Overhead
Choosing an Efficient Storage Format
Partitioning Tables to Reduce Scan Size
Clustering Tables Based on High-Cardinality Keys
Time-Insensitive Use Cases
Batch Queries
File Loads

8. Advanced Queries
Reusable Queries
Parameterized Queries
SQL User-Defined Functions
Reusing Parts of Queries
Advanced SQL
Working with Arrays
Window Functions
Table Metadata
Data Definition Language and Data Manipulation Language
Beyond SQL
JavaScript UDFs
Advanced Functions
BigQuery Geographic Information Systems
Useful Statistical Functions
Hash Algorithms

9. Machine Learning in BigQuery
What Is Machine Learning?
Formulating a Machine Learning Problem
Types of Machine Learning Problems
Building a Regression Model
Choose the Label
Exploring the Dataset to Find Features
Creating a Training Dataset
Training and Evaluating the Model
Predicting with the Model
Examining Model Weights
More-Complex Regression Models
Building a Classification Model
Choosing the Threshold
Customizing BigQuery ML
Controlling Data Split
Balancing Classes
k-Means Clustering
What’s Being Clustered?
Clustering Bicycle Stations
Carrying Out Clustering
Understanding the Clusters
Data-Driven Decisions
Recommender Systems
The MovieLens Dataset
Matrix Factorization
Making Recommendations
Incorporating User and Movie Information
Custom Machine Learning Models on GCP
Hyperparameter Tuning
Support for TensorFlow

10. Administering and Securing BigQuery
Infrastructure Security
Identity and Access Management
Administering BigQuery
Job Management
Authorizing Users
Restoring Deleted Records and Tables
Continuous Integration/Continuous Deployment
Cost/Billing Exports
Dashboards, Monitoring, and Audit Logging
Availability, Disaster Recovery, and Encryption
Zones, Regions, and Multiregions
BigQuery and Failure Handling
Durability, Backups, and Disaster Recovery
Privacy and Encryption
Regulatory Compliance
Data Locality
Restricting Access to Subsets of Data
Removing All Transactions Related to a Single Individual
Data Loss Prevention
Data Exfiltration Protection


Managementboek Top 100


Populaire producten



        Google BigQuery: The Definitive Guide