Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20
, ,

Trino – The Definitive Guide

SQL at Any Scale, on Any Storage, in Any Environment

Paperback Engels 2021 9781098107710
Verkooppositie 7303
Verwachte levertijd ongeveer 8 werkdagen

Samenvatting

Perform fast interactive analytics against different data sources using the Trino high-performance distributed SQL query engine. With this practical guide, you'll learn how to conduct analytics on data where it lives, whether it's Hive, Cassandra, a relational database, or a proprietary data store. Analysts, software engineers, and production engineers will learn how to manage, use, and even develop with Trino.

Initially developed by Facebook, open source Trino is now used by Netflix, Airbnb, LinkedIn, Twitter, Uber, and many other companies. Matt Fuller, Manfred Moser, and Martin Traverso show you how a single Trino query can combine data from multiple sources to allow for analytics across your entire organization.

- Get started: Explore Trino's use cases and learn about tools that will help you connect to Trino and query data
- Go deeper: Learn Trino's internal workings, including how to connect to and query data sources with support for SQL statements, operators, functions, and more
- Put Trino in production: Secure Trino, monitor workloads, tune queries, and connect more applications; learn how other organizations apply Trino

Specificaties

ISBN13:9781098107710
Taal:Engels
Bindwijze:paperback
Aantal pagina's:310
Uitgever:O'Reilly
Druk:1
Verschijningsdatum:4-5-2021
Hoofdrubriek:IT-management / ICT

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Geef uw waardering

Zeer goed Goed Voldoende Matig Slecht

Inhoudsopgave

Foreword
Preface
About the Book
Conventions Used in This Book
Code Examples, Permissions, and Attribution
O’Reilly Online Learning
How to Contact Us
Acknowledgments

I. Getting Started with Trino
1. Introducing Trino
The Problems with Big Data
Trino to the Rescue
Designed for Performance and Scale
SQL-on-Anything
Separation of Data Storage and Query Compute Resources
Trino Use Cases
One SQL Analytics Access Point
Access Point to Data Warehouse and Source Systems
Provide SQL-Based Access to Anything
Federated Queries
Semantic Layer for a Virtual Data Warehouse
Data Lake Query Engine
SQL Conversions and ETL
Better Insights Due to Faster Response Times
Big Data, Machine Learning, and Artificial Intelligence
Other Use Cases
Trino Resources
Website
Documentation
Community Chat
Source Code, License, and Version
Contributing
Book Repository
Iris Data Set
Flight Data Set
A Brief History of Trino
Conclusion

2. Installing and Configuring Trino
Trying Trino with the Docker Container
Installing from Archive File
Java Virtual Machine
Python
Installation
Configuration
Adding a Data Source
Running Trino
Conclusion

3. Using Trino
Trino Command-Line Interface
Getting Started
Pagination
History
Additional Diagnostics
Executing Queries
Output Formats
Ignoring Errors
Trino JDBC Driver
Downloading and Registering the Driver
Establishing a Connection to Trino
Trino and ODBC
Client Libraries
Trino Web UI
SQL with Trino
Concepts
First Examples
Conclusion

II. Diving Deeper into Trino
4. Trino Architecture
Coordinator and Workers in a Cluster
Coordinator
Discovery Service
Workers
Connector-Based Architecture
Catalogs, Schemas, and Tables
Query Execution Model
Query Planning
Parsing and Analysis
Initial Query Planning
Optimization Rules
Predicate Pushdown
Cross Join Elimination
TopN
Partial Aggregations
Implementation Rules
Lateral Join Decorrelation
Semi-Join (IN) Decorrelation
Cost-Based Optimizer
The Cost Concept
Cost of the Join
Table Statistics
Filter Statistics
Table Statistics for Partitioned Tables
Join Enumeration
Broadcast Versus Distributed Joins
Working with Table Statistics
Trino ANALYZE
Gathering Statistics When Writing to Disk
Hive ANALYZE
Displaying Table Statistics
Conclusion

5. Production-Ready Deployment
Configuration Details
Server Configuration
Logging
Node Configuration
JVM Configuration
Launcher
Cluster Installation
RPM Installation
Installation Directory Structure
Configuration
Uninstall Trino
Installation in the Cloud
Cluster Sizing Considerations
Conclusion

6. Connectors
Configuration
RDBMS Connector Example PostgreSQL
Query Pushdown
Parallelism and Concurrency
Other RDBMS Connectors
Security
Trino TPC-H and TPC-DS Connectors
Hive Connector for Distributed Storage Data Sources
Apache Hadoop and Hive
Hive Connector
Hive-Style Table Format
Managed and External Tables
Partitioned Data
Loading Data
File Formats and Compression
MinIO Example
Non-Relational Data Sources
Trino JMX Connector
Black Hole Connector
Memory Connector
Other Connectors
Conclusion

7. Advanced Connector Examples
Connecting to HBase with Phoenix
Key-Value Store Connector Example: Accumulo
Using the Trino Accumulo Connector
Predicate Pushdown in Accumulo
Apache Cassandra Connector
Streaming System Connector Example: Kafka
Document Store Connector Example: Elasticsearch
Overview
Configuration and Usage
Query Processing
Full-Text Search
Summary
Query Federation in Trino
Extract, Transform, Load and Federated Queries
Conclusion

8. Using SQL in Trino
Trino Statements
Trino System Tables
Catalogs
Schemas
Information Schema
Tables
Table and Column Properties
Copying an Existing Table
Creating a New Table from Query Results
Modifying a Table
Deleting a Table
Table Limitations from Connectors
Views
Session Information and Configuration
Data Types
Collection Data Types
Temporal Data Types
Type Casting
SELECT Statement Basics
WHERE Clause
GROUP BY and HAVING Clauses
ORDER BY and LIMIT Clauses
JOIN Statements
UNION, INTERSECT, and EXCEPT Clauses
Grouping Operations
WITH Clause
Subqueries
Scalar Subquery
EXISTS Subquery
Quantified Subquery
Deleting Data from a Table
Conclusion

9. Advanced SQL
Functions and Operators Introduction
Scalar Functions and Operators
Boolean Operators
Logical Operators
Range Selection with the BETWEEN Statement
Value Detection with IS (NOT) NULL
Mathematical Functions and Operators
Trigonometric Functions
Constant and Random Functions
String Functions and Operators
Strings and Maps
Unicode
Regular Expressions
Unnesting Complex Data Types
JSON Functions
Date and Time Functions and Operators
Histograms
Aggregate Functions
Map Aggregate Functions
Approximate Aggregate Functions
Window Functions
Lambda Expressions
Geospatial Functions
Prepared Statements
Conclusion

III. Trino in Real-World Uses
10. Security
Authentication
Password and LDAP Authentication
Authorization
System Access Control
Connector Access Control
Encryption
Encrypting Trino Client-to-Coordinator Communication
Creating Java Keystores and Java Truststores
Encrypting Communication Within the Trino Cluster
Certificate Authority Versus Self-Signed Certificates
Certificate Authentication
Kerberos
Prerequisites
Kerberos Client Authentication
Cluster Internal Kerberos
Data Source Access and Configuration for Security
Kerberos Authentication with the Hive Connector
Hive Metastore Thrift Service Authentication
HDFS Authentication
Cluster Separation
Conclusion

11. Integrating Trino with Other Tools
Queries, Visualizations, and More with Apache Superset
Performance Improvements with RubiX
Workflows with Apache Airflow
Embedded Trino Example: Amazon Athena
Starburst Enterprise
Other Integration Examples
Custom Integrations
Conclusion

12. Trino in Production
Monitoring with the Trino Web UI
Cluster-Level Details
Query List
Query Details View
Tuning Trino SQL Queries
Memory Management
Task Concurrency
Worker Scheduling
Scheduling Splits per Task and per Node
Local Scheduling
Network Data Exchange
Concurrency
Buffer Sizes
Tuning Java Virtual Machine
Resource Groups
Resource Group Definition
Scheduling Policy
Selector Rules Definition
Conclusion

13. Real-World Examples
Deployment and Runtime Platforms
Cluster Sizing
Hadoop/Hive Migration Use Case
Other Data Sources
Users and Traffic
Conclusion

14. Conclusion

Index

Alle 100 bestsellers

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Trino – The Definitive Guide