Hive study notes (eight)-metadata management and storage remote mode

1. Metastore

  • In the specific use of Hive , the first problem faced is how to define table structure information and successfully map structured data. The so-called mapping refers to a correspondence.
  • In Hive , you need to describe the mapping relationship between tables and files, the relationship between columns and fields, and so on. These data describing the mapping relationship are called Hive metadata.
  • This data is very important, because only by querying it can the relationship between the user-written sql and the final operation file be determined .

Metadata  is metadata. Metadata includes metadata  such as database , table , and table fields created with Hive  . Metadata is stored in a relational database. Such as the built-in Derby of hive , third parties such as MySQL, etc.         Metastore  , or metadata service, is  a service used by Hive  to manage the metadata of database tables. With its upper-level services, you no longer need to deal with bare file data, but can build a computing framework based on structured library table information.  Expose  the metadata of Hive through the metastore service  , instead of accessing the Hive metadata database mysql to get the metadata information of Hive ;     The metastore  service is actually a kind of  thrift  service through which users can obtain Hive  metadata. And through  the way of obtaining metadata through thrift  , the details such as the driver, url  , user name, and password required for database access are shielded . 2. Remote mode   In remote mode, the   metastore   service   needs to be started separately , and then each client is configured to connect to the metastore service   in the configuration file . The remote mode metastore   service and   hive   run in different processes.    In raw  production environment, we recommend using the remote configuration mode  Hive Metastore  .

In this mode, other   software that depends on Hive   can   access Hive through the Metastore   .        At this time, you need to configure the hive.metastore.uris parameter to specify the   ip and port of the machine where the metastore   service runs   , and you need to manually start the metastore service separately.      The metastore   service can be configured on multiple nodes to avoid a single node failure causing the hive client of the entire cluster to become   unavailable.    At the same time, the   hive client   configures multiple   metastore   addresses and automatically selects the available nodes.      3. Metastore  remote mode configuration

3.1 Copy the hive of 101 machine to 102 103

[[email protected] servers]# pwd
/opt/lagou/servers
[[email protected] servers]# scp -r hive-2.3.7/ linux102:$PWD

3.2 Copy the profile configuration file to 102 103 and take effect

[[email protected] servers]# cd /etc/
[[email protected] etc]# scp profile linux103:$PWD
profile
source /etc/profile

3.3 Install lsof

yum install lsof -y

3.4 Start 101 server

[[email protected] ~]# nohup hive --service metastore &
[1] 48342
[[email protected] ~]# nohup: ignoring input and appending output to'nohup.out'
[[email protected] ~]# lsof -i:9083

3.5  Modify hive-site.xml on linux102 . Delete the configuration file: MySQL configuration, user name and password to connect to the database; add configuration to connect to metastore

<!-- hive metastore service address-->
<property>
<name>hive.metastore.uris</name>
<value>thrift://linux101:9083,thrift://linux103:9083</value>
</ property>

3.6 102 machine start

hive
There must be a node metastore started

4. HiveServer2 configuration

4.1  Modify core-site.xml on the cluster

[[email protected] hadoop]# pwd
/opt/lagou/servers/hadoop-2.9.2/etc/hadoop
[[email protected] hadoop]# vim core-site.xml
<!-- HiveServer2 cannot connect to 10000; hadoop is the installation user-->
<!-- root user can proxy all users on all hosts-->
<property>
        <name>hadoop.proxyuser.root.hosts</name >
        <value>*</value>
</property>
<property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
</property>
<property>
        <name>hadoop. proxyuser.hadoop.hosts</name>
        <value>*</value>
</property>
<property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
</property>

4.2  Modify hdfs-site.xml on the cluster

<!-- HiveServer2 cannot connect to 10000; enable webhdfs service-->
<property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
</property>

4.3 Stop the service and restart

Transfer the modified configuration file to other servers
First stop service
[[email protected] hadoop]# stop-dfs.sh
http://192.168.2.130:10002/

4.4 Start beeline on node 102

[[email protected] bin]# pwd
/opt/lagou/servers/hive-2.3.7/bin
[[email protected] bin]# ./beeline  
Beeline version 2.3.7 by Apache Hive
beeline> !connect jdbc:hive2:// linux103:10000
Connecting to jdbc:hive2://linux103:10000
Enter username for jdbc:hive2://linux103:10000:  
Enter password for jdbc:hive2://linux103:10000:  
Connected to: Apache Hive (version 2.3.7 )
Driver: Hive JDBC (version 2.3.7)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://linux103:10000>
use mydb;   show tables;   select * from emp;    !connect jdbc:mysql://linux123:3306    Enter username for jdbc:mysql://linux103:3306: hive
Enter password for jdbc:mysql://linux103:3306: ********    1: jdbc:mysql://linux101:3306> use hivemetadata;
No rows affected (0.002 seconds)
1: jdbc:mysql://linux101:3306> show tables;    1: jdbc:mysql://linux101:3306> select * from VERSION; View version metadata