Write the contents of hbase and mongodb to hive

Article Directory

Write the contents of hbase to hive

1. Problems encountered when starting hive

$ hive
Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

The Metastore Server service process of Hive is not started normally

nohup bin/hive --service metastore &

2. Hive related settings

//设置true用于打开动态分区功能
set hive.exec.dynamic.partition=true;
//允许全部动态分区 strict要求分区字段至少有一个是静态的分区值
set hive.exec.dynamic.partition.mode=nonstrict;
//关闭map阶段优化
set hive.auto.convert.join=false;
//能被每个mapper或者reducer创建的最大动态分区的数目
set hive.exec.max.dynamic.partitions.pernode=100;
//被一条带有动态分区的sql语句所能创建的动态分区总量
set hive.exec.max.dynamic.partitions=1000;
//全局能被创建文件数目的最大值,专门有一个hadoop计数器来跟踪该值,如果超出会报错
set hive.exec.max.created.files=100000;

3. Create the hb_user external table and map it to the events_db:users table of hbase

set hivevar:db=events;
create external table ${db}.hb_user(
user_id String,
birth_year int,
gender String,
locale String,
location String,
time_zone String,
joined_at String
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties(
'hbase.columns.mapping'='
:key,
profile:birthyear,
profile:gender,
region:locale,
region:location,
region:timezone,
registration:joinedAt'
)
tblproperties('hbase.table.name'='events_users');

4. Create an internal table user, and store the data of the hb_user external table in the internal table in orc format to improve query efficiency

create table ${db}.user
stored as orc as 
select * from ${db}.hb_user;

5. Delete the external table

drop table if exists ${db}.hb_user;

View hard drive usage

df -h
Insert picture description here
du -h -x --max-depth=1

Write the content in mongodb to hive

1. Dependent package

    <dependency>
      <groupId>org.mongodb</groupId>
      <artifactId>mongodb-driver</artifactId>
      <version>3.12.7</version>
    </dependency>
    <dependency>
      <groupId>org.mongodb.mongo-hadoop</groupId>
      <artifactId>mongo-hadoop-core</artifactId>
      <version>2.0.2</version>
    </dependency>
    <dependency>
      <groupId>org.mongodb.mongo-hadoop</groupId>
      <artifactId>mongo-hadoop-hive</artifactId>
      <version>2.0.2</version>
    </dependency>

2. Copy the local library to the lib directory under the hive root directory. The
address of the local library can be viewed in the setting.xml file of maven
3. Create an external table

create external table ${db}.mg_train(
user_id String,
event_id String,
invited String,
time_stamp String,
interested String
)
stored by 'com.mongodb.hadoop.hive.MongoStorageHandler'
with properties(
'mongo.columns.mapping'='{
"user_id":"user",
"event_id":"event",
"invited":"invited",
"time_stamp":"timestamp",
"interested":"interested"
}'	
)
tblproperties('mongo.uri'='mongodb://kgcuser:[email protected]:27017/kgcdsj.train');

4. Create an internal table

create table ${db}.mgtrain
stored as orc as 
select * from ${db}.mg_train;