hive metadata analysis

Preface

在对hive SQL进行解析,以及跟踪hive job与yarn application的关系时,
还有对hive数据仓库进行数据治理时,需要对hive元数据有个较为清楚的认识,
进而更好的在解析SQL时,对数据访问进行权限控制;
在资源管理时,进行资源归属;
在数据生命周期管理时对其进行有效管理

hive metadata database, tables

  • Hive metadata is stored in mysql. If installed by default, it is a hive database, which contains a series of data tables related to data tables, partitions, data skew, data storage, compression, etc.

version

  • Store hive version information
Insert picture description here

dbs

  • Store the hdfs directory, number and other information of the hive database
Insert picture description here

DATABASE_PARAMS

  • This table stores the relevant parameters of the database. The parameters specified with WITH DBPROPERTIES(property_name=property_value, …) will exist in the table when CREATE DATABASE; its primary key is the primary key of dbs
Insert picture description here

tlbs

  • Store the database, owner, creation time, and last access time of the hive data table [default invalid, need to modify the source code to take effect] and other information
Insert picture description here

TABLE_PARAMS

  • This table stores the property information of the table/view, tbl_properties when the table is created
Insert picture description here

TBL_PRIVS

-This table stores the authorization information of the table/view

Insert picture description here

partitions

  • Store partition records of hive partition table, access time, table_id, etc.
Insert picture description here

PARTITION_KEYS

-Field information of the storage partition of the table

Insert picture description here

PARTITION_KEY_VALS

-The table stores partition field values

Insert picture description here

PARTITION_PARAMS

-Attribute information of the storage partition of the table

Insert picture description here

SDS

  • This table saves the basic information stored in the file, such as input, output format, compression format, compression, serialization and other information. The SD_ID in the TBLS table is associated with this table, and the storage information of the Hive table can be obtained.
Insert picture description here

SD_PARAMS

  • This table stores the attribute information stored by Hive, which is specified by STORED BY'storage.handler.class.name' [WITH SERDEPROPERTIES (...) when creating the table.
Insert picture description here

SERDES

  • This table stores class information used for serialization
Insert picture description here

SERDE_PARAMS

  • The table stores some serialized attributes, format and other information, such as: row, column separator
Insert picture description here

other

  • Others include index-related information, data tilt, authority management, partition statistics, authorization, etc., so I won’t repeat them one by one.