Presto technical series detailed 2-based on presto-base-jdbc and presto-spi plug-in implementation

Presto's multi-source query capability is realized through the Connector mechanism. Among them, Connectors such as MySQL and Hana mainly use the code in presto-base-jdbc to read and write data sources such as SQL.

1. Presto-main, spi, base jdbc relationship

Java can be used as an interface-oriented programming language. Among them, presto-spi mainly defines some public interfaces for the code in Presto-main to call.

presto-base-jdbc is a public module of the database connector, which implements and supplements Presto-main. After the code is compiled, it will be correspondingly loaded into plug-ins such as mysql-plugin to realize the access function of mysql to the data source. base-jdbc itself does not compile separate plugins.
The relationship between Presto-core, Pesto SPI, and Presto Base JDBC is shown in the following figure:

Insert picture description here

2. SPI interface

The SPI interface API can be divided into three types:

  • Obtain table, column, and field type information, which are used to verify whether the query is semantically valid, and perform type checking and security checking of expressions in the original query. (Metadata SPI, syntax and semantic analysis stage)
  • Get information about the number of rows and the size of the table to implement cost-based query optimization when executing the plan. (Data Statistics SPI is executed in the query plan generator stage)
  • Simplifies the creation of distributed query plans. It is used to generate a logical split of the table content. The split block is the smallest unit of work distribution and parallel work (Data Location SPI)
Insert picture description here


Insert picture description here
Insert picture description here

This design makes it convenient for users to develop based on interfaces through plug-in development, and realize different Connector plug-ins to connect to their own storage systems. Presto provides a set of connector interfaces to read metadata from custom storage and column storage data. The basic concept of connector:

  • ConnectorMetadata: Manage table metadata, table metadata, partition and other information. When processing the request, it is necessary to obtain meta-information in order to confirm the location of the read data. Presto will pass in filter conditions in order to reduce the scope of the data read. Meta information can be read from disk or cached in memory.
  • ConnectorSplit: A collection of data processed by an IO Task, which is the unit of scheduling. One split can correspond to one partition, or multiple partitions.
  • SplitManager: Construct a split according to the meta of the table.
  • PageSource: According to the split information and the column information to be read, 0 or more pages are read from the disk for calculation by the calculation engine.

The main functions that can be added in the plug-in:

  • Connect to your own storage system.
  • Add custom data types.
  • Add custom processing functions.
  • Custom permission control.
  • Custom resource control.
  • Add query event processing logic.