Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20
, ,

Trino – The Definitive Guide

SQL at Any Scale, on Any Storage, in Any Environment

Paperback Engels 2022 9781098137236
Verkooppositie 3699Hoogste positie: 3699
Verwachte levertijd ongeveer 8 werkdagen

Samenvatting

Perform fast interactive analytics against different data sources using the Trino high-performance distributed SQL query engine. In the second edition of this practical guide, you'll learn how to conduct analytics on data where it lives, whether it's a data lake using Hive, a modern lakehouse with Iceberg or Delta Lake, a different system like Cassandra, Kafka, or SingleStore, or a relational database like PostgreSQL or Oracle.

Analysts, software engineers, and production engineers learn how to manage, use, and even develop with Trino and make it a critical part of their data platform. Authors Matt Fuller, Manfred Moser, and Martin Traverso show you how a single Trino query can combine data from multiple sources to allow for analytics across your entire organization.

- Explore Trino's use cases, and learn about tools that help you connect to Trino for querying and processing huge amounts of data
- Learn Trino's internal workings, including how to connect to and query data sources with support for SQL statements, operators, functions, and more
- Deploy and secure Trino at scale, monitor workloads, tune queries, and connect more applications
- Learn how other organizations apply Trino successfully

Specificaties

ISBN13:9781098137236
Taal:Engels
Bindwijze:paperback
Aantal pagina's:319
Uitgever:O'Reilly
Druk:1
Verschijningsdatum:14-10-2022
Hoofdrubriek:IT-management / ICT
ISSN:

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Geef uw waardering

Zeer goed Goed Voldoende Matig Slecht

Inhoudsopgave

Foreword
Preface
Conventions Used in This Book
Code Examples, Permissions, and Attribution
O'Reilly Online Learning
How to Contact Us
Acknowledgments

Part I. Getting Started with Trino
1. Introducing Trino
The Problems with Big Data
Trino to the Rescue
Designed for Performance and Scale
SQL-on-Anything
Separation of Data Storage and Query Compute Resources
Trino Use Cases
One SQL Analytics Access Point
Access Point to Data Warehouse and Source Systems
Provide SQL-Based Access to Anything
Federated Queries
Semantic Layer for a Virtual Data Warehouse
Data Lake Query Engine
SQL Conversions and ETL
Better Insights Due to Faster Response Times
Big Data, Machine Learning, and Artificial Intelligence
Other Use Cases
Trino Resources
Website
Documentation
Community Chat
Source Code, License, and Version
Contributing
Book Repository
Iris Data Set
Flight Data Set
A Brief History of Trino
Conclusion

2. Installing and Configuring Trino
Trying Trino with the Docker Container
Installing from the Archive File
Java Virtual Machine
Python
Installation
Configuration
Adding a Data Source
Running Trino
Conclusion

3. Using Trino
Trino Command-Line Interface
Getting Started
Pagination
History and Completion
Additional Diagnostics
Executing Queries
Output Formats
Ignoring Errors
Trino JDBC Driver
Downloading and Registering the Driver
Establishing a Connection to Trino
Trino and ODBC
Client Libraries
Trino Web UI
SQL with Trino
Concepts
First Examples
Conclusion

Part II. Diving Deeper into Trino
4. Trino Architecture
Coordinator and Workers in a Cluster
Coordinator
Discovery Service
Workers
Connector-Based Architecture
Catalogs, Schemas, and Tables
Query Execution Model
Query Planning
Parsing and Analysis
Initial Query Planning
Optimization Rules
Predicate Pushdown
Cross Join Elimination
TopN
Partial Aggregations
Implementation Rules
Lateral Join Decorrelation
Semi-Join (IN) Decorrelation
Cost-Based Optimizer
The Cost Concept
Cost of the Join
Table Statistics
Filter Statistics
Table Statistics for Partitioned Tables
Join Enumeration
Broadcast Versus Distributed Joins
Working with Table Statistics
Trino ANALYZE
Gathering Statistics When Writing to Disk
Hive ANALYZE
Displaying Table Statistics
Conclusion

5. Production-Ready Deployment
Configuration Details
Server Configuration
Logging
Node Configuration
JVM Configuration
Launcher
Cluster Installation
RPM Installation
Installation Directory Structure
Configuration
Uninstall Trino
Installation in the Cloud
Helm Chart for Kubernetes Deployment
Cluster Sizing Considerations
Conclusion

6. Connectors
Configuration
RDBMS Connector Example: PostgreSQL
Query Pushdown
Parallelism and Concurrency
Other RDBMS Connectors
Security
Query Pass-Through
Trino TPC-H and TPC-DS Connectors
Hive Connector for Distributed Storage Data Sources
Apache Hadoop and Hive
Hive Connector
Hive-Style Table Format
Managed and External Tables
Partitioned Data
Loading Data
File Formats and Compression
MinIO Example
Modern Distributed Storage Management and Analytics
Non-Relational Data Sources
Trino JMX Connector
Black Hole Connector
Memory Connector
Other Connectors
Conclusion

7. Advanced Connector Examples
Connecting to HBase with Phoenix
Key-Value Store Connector Example: Accumulo
Using the Trino Accumulo Connector
Predicate Pushdown in Accumulo
Apache Cassandra Connector
Streaming System Connector Example: Kafka
Document Store Connector Example: Elasticsearch
Overview
Configuration and Usage
Query Processing
Full-Text Search
Summary
Query Federation in Trino
Extract, Transform, Load and Federated Queries
Conclusion

8. Using SQL in Trino
Trino Statements
Trino System Tables
Catalogs
Schemas
Information Schema
Tables
Table and Column Properties
Copying an Existing Table
Creating a New Table from Query Results
Modifying a Table
Deleting a Table
Table Limitations from Connectors
Views
Session Information and Configuration
Data Types
Collection Data Types
Temporal Data Types
Type Casting
SELECT Statement Basics
WHERE Clause
GROUP BY and HAVING Clauses
ORDER BY and LIMIT Clauses
JOIN Statements
UNION, INTERSECT, and EXCEPT Clauses
Grouping Operations
WITH Clause
Subqueries
Scalar Subquery
EXISTS Subquery
Quantified Subquery
Deleting Data from a Table
Conclusion

9. Advanced SQL
Functions and Operators Introduction
Scalar Functions and Operators
Boolean Operators
Logical Operators
Range Selection with the BETWEEN Statement
Value Detection with IS (NOT) NULL
Mathematical Functions and Operators
Trigonometric Functions
Constant and Random Functions
String Functions and Operators
Strings and Maps
Unicode
Regular Expressions
Unnesting Complex Data Types
JSON Functions
Date and Time Functions and Operators
Histograms
Aggregate Functions
Map Aggregate Functions
Approximate Aggregate Functions
Window Functions
Lambda Expressions
Geospatial Functions
Prepared Statements
Conclusion

Part III. Trino in Real-World Uses
10. Security
Authentication
Password and LDAP Authentication
Other Authentication Types
Authorization
System Access Control
Connector Access Control
Encryption
Encrypting Trino Client-to-Coordinator Communication
Creating Java Keystores and Java Truststores
Encrypting Communication Within the Trino Cluster
Certificate Authority Versus Self-Signed Certificates
Certificate Authentication
Kerberos
Prerequisites
Kerberos Client Authentication
Data Source Access and Configuration for Security
Kerberos Authentication with the Hive Connector
Hive Metastore Service Authentication
HDFS Authentication
Cluster Separation
Conclusion

11. Integrating Trino with Other Tools
Queries, Visualizations, and More with Apache Superset
Performance Improvements with RubiX
Workflows with Apache Airflow
Embedded Trino Example: Amazon Athena
Convenient Commercial Distributions: Starburst Enterprise and Starburst Galaxy
Other Integration Examples
Custom Integrations
Conclusion

12. Trino in Production
Monitoring with the Trino Web UI
Cluster-Level Details
Query List
Query Details View
Tuning Trino SQL Queries
Memory Management
Task Concurrency
Worker Scheduling
Network Data Exchange
Concurrency
Buffer Sizes
Tuning Java Virtual Machine
Resource Groups
Resource Group Definition
Scheduling Policy
Selector Rules Definition
Conclusion

13. Real-World Examples
Deployment and Runtime Platforms
Cluster Sizing
Hadoop/Hive Migration Use Case
Other Data Sources
Users and Traffic
Conclusion
Conclusion

Index
About the Authors

Managementboek Top 100

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Trino – The Definitive Guide