Xiaobai's Big Data Journey of Daguai Upgrade (62) <Hive Journey Third Stop: Hive Data Type>

Daguai Upgrade: Xiaobai's Big Data Journey (62)

The third stop of the Hive journey: Hive data types

Last review

In the previous chapter, we learned about the installation of Hive. In this chapter, we formally learn about the related operations of Hive. According to convention, learning a new language requires understanding its data types.

type of data

  • Hive's data types are divided into two categories, basic data types and collection data types. Collection data types can be nested, so it is more flexible and changeable.
  • The data type of Hive is basically similar to the data type of Java we are learning, so we only need to remember some special ones to easily master it.

Notes and grammar specifications

There are two ways to comment on hive

  • Single line comment --
  • Multi-line comments /**/

The grammar specification of HQL (the same as learned in Mysql)
(1) SQL language is not case sensitive.
(2) SQL can be written in one or more lines.
(3) Keywords cannot be abbreviated and cannot be divided into lines.
(4) Each clause should generally be written in separate lines.
(5) Use indentation to improve the readability of sentences.

Basic data type

HIVEMySQLJAVAlengthexample
TINYINTTINYINTbyte1byte signed integer2
SMALINTSMALINTshort2byte signed integer20
INTINTint4byte signed integer20
BIGINTBIGINTlong8byte signed integer20
BOOLEANnobooleanBoolean type, true or falseTRUE FALSE
FLOATFLOATfloatSingle-precision floating-point number3.14159
DOUBLEDOUBLEdoubleDouble-precision floating-point number3.14159
STRINGVARCHARstringCharacter series. You can specify the character set. You can use single or double quotes'now is the time' “for all good men”
TIMESTAMPTIMESTAMPTime type
BINARYBINARYByte array

For the basic data types of Hive, we need to distinguish the following special types from Java

  • INT
  • BIGINT
  • DOUBLE
  • STRING

The basic data types are relatively simple, so I won’t cite the chestnuts, it is the same as the way we use it in mysql

Collection data type

type of datadescriptionFeaturesSyntax example
STRUCTThe struct type is similar to the collection type, but it uses .to access element contentThe number can be different, but the type is the samestruct<street:string, city:string>
MAPMAP is a set of key-value pair tuples, and data can be accessed using array notation. For example, if the data type of a column is MAP, and the key->value pairs are'first'->'John' and'last'->'Doe', then the last one can be obtained by the field name ['last'] elementThe key-value can be different, and the number can also be differentmap<string, int>
ARRAYAn array is a collection of variables with the same type and name. These variables are called elements of the array, and each array element has a number, which starts from zero. For example, if the array value is ['John','Doe'], then the second element can be referenced by the array name [1]Same number, same typearray<string>

Let's use an example to demonstrate our collection data type

Assuming that a table has the following row, we use JSON format to represent its data structure. The data looks like this

{
    "name": "songsong",
    "friends": ["bingbing" , "lili"],       //列表Array, 
    "children": {                              //键值Map,
        "xiao song": 18,
        "xiaoxiao song": 19
    }
    "address": {                              //结构Struct,
        "street": "hui long guan",
        "city": "beijing",
        "email": "10010"
    }
}

Below we import the test data according to the above

# 创建本地测试文件,我放到了hive根目录下,并专门创建了一个存储数据的文件夹
# vim $HIVE_HOME/dbdata/test1.txt 
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing_10010
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing_10011

Data interpretation

/*
共创建了两条数据,使用换行`\n`进行区分,每条数据中,每个类型间的数据使用`,` 进行区分,使用_来区分当前类型间的两个数据
数据类型如下:
	name: string类型
	friends: array类型
	children: map类型
	address: struct类型
*/

# 第一条数据
songsong,
bingbing_lili,
xiao song:18_xiaoxiao song:19,
hui long guan_beijing_10010

# 第二条数据
yangyang,
caicai_susu,
xiao yang:18_xiaoxiao yang:19,
chao yang_beijing_10011


 

Create a table, I will explain the parameters of the table in detail later, please copy it directly

create table test(
name string,
friends array<string>,
children map<string, int>,
address struct<street:string,city:string,email:int>
)
row format delimited fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';

Import data into the test table

load data local inpath "/opt/module/hive/dbdata/test/test1.txt" into table test;

Access data

-- 获取名字叫songsong的第二个朋友、songsong的孩子xiao song的年龄以及songsong的地址信息
select 
	name,
	friends[1],
	children['xiao song'],
	address.city,
	address.street 
from test
where name="songsong";

Type conversion

In Hive, its atomic data types can be implicitly converted (atomic data types refer to basic data types)

Implicit type conversion rules are also divided into automatic conversion and forced conversion:

Automatic conversion

  • Any integer type can be implicitly converted to a broader type, such as TINYINT can be converted to INT, INT can be converted to BIGINT
  • All integer types, FLOAT and STRING types can be implicitly converted to DOUBLE
  • TINYINT, SMALLINT, INT can all be converted to FLOAT
  • The BOOLEAN type cannot be converted to any other type

Coercion (using the keyword CAST)

  • For example, CAST('1' AS INT) will convert the string '1' into an integer 1. If the coercion fails, such as executing CAST('X' AS INT), the expression returns NULL.

Let's experience it with an example of type conversion

-- 自动转换
select '1'+2
-- 强制转换
select cast("1" as int) + 2;
-- 转换失败的示例,会返回NULL
select cast("a" as int) + 2;

to sum up

This chapter mainly introduces the data types of HQL. In order to make it easy for everyone to view later, I will split and publish each individual knowledge point. The next chapter is an introduction to the operation of the database and the data table.