What is Presto or PrestoDB?
Presto is a distributed SQL query engine that runs on machines's clusters. The complete setup includes a coordinator and multiple staff. The query is submitted to the coordinator from the client (eg Presto CLI). The coordinator parses, analyzes, and schedules query execution, and then assigns processing to staff.
Capabilities of Presto :-
1. .It allows querying data such as Hive, Cassandra, relational databases and even proprietary data storage.
2. It allows a single Presto query to merge data from multiple sources.
3. A quicker response time breaks the myth of “using expensive business solutions or using a slow free solution that requires a lot of equipment for rapid analysis
How does Presto work?
Presto is a distributed system running on Hadoop and uses a structure similar to the classic massively parallel processing database management system (MPP).It has a coordinator node that works synchronously with multiple worker program nodes. Users submit their SQL queries to the coordinator, which uses a custom query and execution engine to parse, plan, and schedule distributed query plans across work nodes. Designed to support standard ANSI SQL semantics, including complex queries, aggregations, joins, left / right outer links, subqueries, window functions, different enumerations and approximate percentages
After compiling the query, Presto divides the request into multiple phases via the worker program nodes. All processing takes place in memory and is routed across the network between phases to avoid any unnecessary I/O increase. Adding more operator nodes can result in more parallels and faster processing.
In order to make Presto expandable to any data source, it is designed with abstract storage capacity to facilitate the construction of connectable connectors.many connectors that appear in Presto, including non-relational sources (such as Hadoop Distribution Client System (HDFS), MDS3, MongoDB) and related sources (such as MySQL, PostgreSQL). MySQL, Amazon Redshift, Microsoft SQL Server, and Teradata). determine where the information is stored, without having to convert it into a unique analytics system.
What Presto Is Not :-
Presto is not a general-purpose relational database. Databases such as MySQL, PostgreSQL, or Oracle cannot be replaced. Presto is not suitable for online transaction processing (OLTP)
What Are Its Use Cases?
Presto is a SQL-based query engine that uses an expandable MPP structure. It is just a query engine that isolates calculations and storage based on connectors to integrate with other data sources for communication. With this potential, there are other technologies in space that can provide ability to query against:
Traditional Databases
- MySQL
- PostGres
- SQL Server
Non-relational Databases
- Mongodb
- Redis
- Cassandra
Columnar file formats (such as ORC, Parquet and Avro) are stored in the following locations:
- Amazon S3
- Google Cloud Store
- Azure Blog Store
- HDFS
- Clustered file systems
Thank you