Kettle use, kettle new conversion, kettle new job

background:

Study, study, or TM study (smiling face)!

Conversion: A series of steps that can complete a task.

Job: A job includes multiple job items, the job can perform conversion, and the job can be scheduled regularly

1. Conversion

Before using kettle, you can first understand the related concepts of kettle: https://www.kettle.net.cn/category/base

Conversion. It is the most important part of the ETL solution. It handles various operations on data rows at each stage of extraction, conversion, and loading. Conversion includes one or more steps, such as reading files, filtering output lines, data cleaning or loading data into the database.

This article will create a conversion to extract the data from the two tables of the two data sources, then connect and combine them into a new table according to the id, and store it in the new database.

1.1 Configure data source

Double-click Spoon.bat, enter kettle, click the main object tree , right-click to convert, and click New

After the conversion is completed, first create a database connection for the two data sources. The database used here is mysql8. Kettle configures the mysql8 data source. Please see: kettle configuration mysql8

The sql statements for the tables of the two source databases:

The tb_student table of the demo-kettle-student database:

CREATE TABLE `tb_student` (
  `id` int NOT NULL AUTO_INCREMENT,
  `std_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `cls_id` int NULL DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE
) ENGINE InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;

The tb_class table of the demo-kettle-class database:


CREATE TABLE `tb_student` (
  `id` int NOT NULL AUTO_INCREMENT,
  `std_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB RACTER SET = ciferal ROW_FORMAT = Dynamic;

The target database sql statement:

The tb_student_total table of the demo-kettle-total database:

CREATE TABLE `tb_student_total` (
  `id` int NOT NULL AUTO_INCREMENT,
  `std_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `cls_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general NULL DEFAULT NULL,
  PRIMARY NULL id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;

1.2 Edit conversion

After the database connection is configured, click the core object on the left to open the input, and drag a table to input to the editing area on the right

Double-click this table to input, you can set it, here our purpose is to extract the demo-kettle-student tb_student table data, so we need to configure the items:

Step name

Database Connectivity

SQL

The configuration is shown in the figure:

Configure a table input, and enter the demo-kettle-class table input in the configuration of another table.

Herein defined the input table, we need to record two input tables polymerized need to sort the data stream connected to, selecting the core object - "connected -" sort records , recording the two input tables according id Ascending.

After sorting, selecting core object - "connected -" record set is connected , id field by cls_id student class table corresponding to the connection table.

Then select the  core object-"output -" table output , and output the connected records to the demo-kettle-total database tb_student_total table. Here, set the output table database connection, target table, specify database fields, and configure field mapping relationships.

After everything is configured, click the execute button, as follows.

Insert a few test databases into the database and check that the data in demo-kettle-total meets expectations after performing the conversion.

The conversion achieved the expected goal.

2 homework

Click on File -> New - "Job , you can create a new job

This article will create a new job and perform a conversion every three seconds.

Open the  core object-"General , select start, conversion and success. Connect them, then double-click the conversion, select the new conversion in the previous step, as follows:

Then double-click to start, set the timing attribute, and execute it every three seconds, as follows:

Click Run, and then view the database, the data meets expectations.