Getting started with clickHouse basic operations

Article Directory

Introduction

ClickHouse is a 列式database management system (DBMS) for online analysis (OLAP ). ClickHouse is not just a database, it is a database management system.

Determinant storage

There is a concept here: row and column storage.

Rows are stored on disk:

Zhang San 22 Student Li Si 24 Software Engineer Wang Wu 26 Teacher

The columnar format is stored on the disk:

Zhang San Li Si Wang Wu 22 24 26 Student Software Engineer Teacher

This storage structure determines the efficiency of columnar storage in queries.

Row oriented
Column oriented

Online demo writing

Recommend to use database connection tool: DBeaver download and connect to use.

type of data

Integer

Signed integer (-2n-1~2n-1-1):

  • Int8-[-128: 127]
  • Int16-[-32768: 32767]
  • Int32-[-2147483648: 2147483647]
  • Int64-[-9223372036854775808: 9223372036854775807]

Unsigned integer range (0~2n-1):

  • UInt8-[0: 255]
  • UInt16-[0: 65535]
  • UInt32-[0: 4294967295]
  • UInt64-[0: 18446744073709551615]

Floating point

  • Float32-float
  • Float64 – double

Boolean

There is no separate type to store boolean values. The UInt8 type can be used, and the value is limited to 0 or 1.

String

  • Variable-length strings String
    strings can be of any length. It can contain any set of bytes, including null bytes.
  • Fixed-length string FixedString(N) A
    fixed-length N string (N must be a strictly positive natural number), analogous to varchar(255)

Enumerated type

Enum8 uses'String'= Int8 to describe.

Enum16 is described with'String'= Int16.

值得注意的是:clickhouse里面的枚举类型只支持:String = Int, Which corresponds sharply to Java can support a variety of corresponding relationships.

Data group

  • Array(T)

An array composed of elements of type T. T can be of any type, but 数组的数据类型均为同一类型数据.

Tuple

  • Tuple(T1, T2, …)

Tuples, where each element has a separate type.

date

  • Date

It is stored in two bytes and represents the date value from 1970-01-01 (unsigned) to the current date. The minimum value is output as 0000-00-00.

Timestamp

  • DateTime

Four bytes (unsigned) are used to store the Unix timestamp, allowing the storage of values ​​in the same range as the date type. The minimum value is 0000-00-00 00:00:00, and the timestamp type value is accurate to the second.

Execute statement

ClickHouse is strictly case sensitive.

Database operation

Library operations

Show all databases

SHOW DATABASES;

Show all tables in the library

SHOW TABLES FROM datasets

Show table structure

DESCRIBE TABLE hits_100m_obfuscated

Show table creation statement

SHOW CREATE TABLE hits_100m_obfuscated
--More detailed information about the table such as table engine, partition --schema etc can be obtained by

Delete database

drop database  datasets;

View the currently used database

select currentDatabase();

Create database

CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] [ENGINE = engine(...)]
CREATE DATABASE IF NOT EXISTS cbry

Table operations

New table

Need to specify 表引擎ENGINE, the table engine determines the characteristics of the data table, and also determines how the data will be stored and loaded.

It is worth noting that the TimyLogengine here : TinyLog table is used for intermediate data processing in small batches. Used for query, not supported 修改. This engine is suitable for relatively small tables (a maximum of 1,000,000 rows is recommended). If you have many small tables, this table engine is suitable because it is simpler than the Log engine (fewer files need to be opened).

create table IF NOT EXISTS cbry.cbry_user(
    uid Int32,
    name String,
    age UInt32
)ENGINE = TinyLog;

New MergeTree table

需要主键,排序字段( primary key , order by) 两个一致。

create table IF NOT EXISTS cbry.cbry_user(
    uid Int32,
    name String,
    age UInt32
)ENGINE = MergeTree()
PARTITION BY  uid
ORDER BY uid;

PARTITION BY- partitioning key .

To partition a monthly basis, you can use the expression toYYYYMM(date_column), here date_columnis a Date type columns. The format of the partition name will be "YYYYMM".

New copy table

CREATE TABLE IF NOT EXISTS [db.]table_name AS [db2.]name2 [ENGINE = engine]

New use case

INSERT into cbry.cbry_user values(1,'cbry',22)

Column insert

You can specify a query to be inserted in the list of columns, such as: [(c1, c2, c3)]. You can also use column matcher expressions such as *and/or modifiers such as APPLY , EXCEPT , REPLACE .

INSERT INTO cbry.cbry_user (* EXCEPT(name)) Values (3, 24);
Insert picture description here

Query use case

SELECT * FROM cbry_user WHERE name= 'cbry' LIMIT 2;

Modify the use case

Other queries are not supported for modifying data: UPDATE, DELETE, REPLACE, MERGE, UPSERT, INSERT UPDATE.
However, you can use the ALTER TABLE ... DROP PARTITIONquery to remove some of the old data.

mergeTreeModify the table of the created engine:

ALTER table cbry.cbry_user update name = 'cbry' WHERE age = 24
Insert picture description here

Delete use case

drop tables cbry.cbry_user;
alter table cbry.cbry_user delete  where name in ('cbry','zhangsan');

Partition

Query partition information

SELECT  database, table, partition, partition_id, name, path FROM system.parts WHERE table = 'cbry_user'
Insert picture description here

Delete partition

alter table cbry.cbry_user drop partition 3;

The corresponding data stored in the partition will also be deleted

Insert picture description here

Copy partition data

Copy the partition data of one table to another table.

alter table cbry.cbry_user replace partition 1 from cbry.cbry_user ;
Insert picture description here

Clear partition data

Clear partition data

alter table cbry.cbry_user CLEAR column age in partition 1;
Insert picture description here