Kettle data migration

The Chinese name of Kettle is kettle. The main programmer of this project hopes to put various data in a kettle and then stream it out in a specified format.

Kettle is used for data migration, that is, importing all the data in one database into another database. It's simple to say, but what if the fields in the tables of the two databases are completely different? You can't simply copy Paste it, so you need to convert the field name and even the field content of the table once to solve the problem. I won’t explain the installation of Kettle here, I mainly talk about my usage, it can be regarded as a tutorial for novices.

After opening the software, first, we first create a new conversion, ctrl+s to save, enter a file name, and at the same time, when the file is saved for the first time, the file name will directly become the name of the conversion, and then select one that can be found by yourself The address will do.

At this time, looking at the left, there are two tabs that can be switched, namely the main object tree and the core object. The main object tree is the current conversion. The core object is equivalent to a toolbar with various tools. The main ones I use are: input-table input, conversion-value mapping, conversion-field selection, conversion-add constant, script-java code, output-table output.

We drag all the above-mentioned small modules to the main panel, as shown in the figure:

img

Then from the first one, first click the first module table to input, as shown in the figure:

img

Select the fourth icon with an arrow, and then you will find a gray arrow moves with your mouse, then move the mouse to the second module value map, you will find that the arrow turns blue and click the second one Module value mapping, you will find that there is an extra blue arrow between the table input and the value mapping, and you need to select the main output step if you have something to do. In this way, all modules are connected in series. As shown in the figure:

img

The above picture represents a complete table of data exchange. At this time, we need to click on the main object tree in the left column, then right click on the DB connection, select New, and then fill in your database connection information, and test, the connection is successful Then click confirm to create a new connection, as shown in the figure:

At this time, our preparations are basically ready and we can perform basic data migration. The first operation is table input. As the name suggests, table input represents the source of data. Double-click the table input, select the database connection, and write you To get the sql of the data, and then click Preview, you can view the data you find. At this time, you must check the data format, because some fields, such as disabled, are stored in the database as 0 or 1, but in the data found through Kettle , Disabled is Y or N.

img

Generally speaking, the ability to preview the data shows that your sql is correct, and you can proceed to the next step. Then we come to the next step: value mapping, value mapping is to change some values ​​to another value, such as mapping Y and N to 1 and 0. Double-click the value mapping, first select the field name to be used, then fill in the original value and target value, and click OK. As shown in the figure:

img

After that is the most important step: field selection. Because there must be differences in many fields during data migration, how to match the fields in the two tables one by one? This is to modify the field name to what we want through field selection of. First, double-click the field selection to switch to the third tab: Metadata. Then click on the right to get the changed fields, and then Kettle will list the fields in your sql query one by one, fill in the fields in your current table in the renamed column, and fill in the type with the corresponding type. What you need to pay attention to is Integer stands for Long and Number stands for Double. At the same time, the type can be left blank, but it is not recommended. After filling in, select OK. As shown in the figure:

img

At this time, you will find that there may be many fields in your current table that are not in the original table, but these fields are indispensable and cannot be empty. Then you need to use the module of adding constants, which can be filled in Name, type, length, value (default value), and then click OK. As shown in the figure:

img

After completing the above operations, the basic fields can be matched, but some fields require some logical processing, and the retrieved values ​​cannot be directly stored in the current table, such as my own project, our needs It is to put the data on the original website on the new website that we are operating and maintaining, so when migrating, there is already data in the current table, and the id is duplicated. We will directly change the id of the migrated data to a negative value. At the same time, the associated id should also be changed to a negative value, so that the problem of id duplication and data cannot be migrated can be solved, and the association relationship will not be lost. At this time, you need to use the java code module. There is a problem here. The java code module that Kettle comes with will not automatically import the jar package. So it is recommended to write the code in MyEclipse first, and then copy it over. At the same time, you need to import the Copy the jar package. Of course, the imported jar package must exist in the lib folder under Kettle's folder. When writing code, you can double-click getValue in the input fields in the left column, which is the value; double-click setValue in the output fields in the left column, and this is the assignment. When writing code, you can click the test class at the bottom right to test the java code. Therefore, the java code module can be used for logical processing, as shown in the figure:

img

After the above steps, our data should have all been modified into the data in the new table format, then the last step of table output can be performed, double-click the table output, first select the database connection, and then write the one to be exported in the target table Open the table, be careful not to check the crop table, it will delete the original data in the table, and then click the database field tab. If you need the field to be redundant, you need to select the specified database field, and then click the Get Field button on the right. You need to investigate carefully, you need to delete the fields that are not in the current table, right-click, and delete the selected rows. Confirm after editing, as shown in the figure:

img

Then click the run button at the top of the main interface to run this conversion, the successful operation is as shown in the figure:

img

If there are no errors, it means that the conversion is successful. At this time, you can check whether there is any problem with the data in the database, and if there is a problem, you can iteratively debug. If there is an error, you can check the log tab in the execution result below. There are more detailed errors here, which can assist you in debugging.

The above operations can basically meet all the needs of data migration. More practice is enough. Next, I will talk about expansion. Sometimes we still cannot meet the needs after performing all the conversion operations. For example, we need to perform a sql after the data conversion. Format a field.

The first method is to directly use Navicat to execute a SQL after the conversion is performed, which can meet the demand, but requires additional operations. In fact, Kettle can also meet our needs. Kettle has an execution SQL script in the core object script . However, if you drag directly to the next step of the table output, your idea is to execute sql after the table output is executed, but Kettle’s mechanism is that as long as there is sql, it is executed first, so it is actually executed before you insert the data. Segment sql.

So, what should we do? It involves the concept of homework. What is the job? A simple understanding is that there are many conversions together and executed sequentially from top to bottom. Can this not solve our problem? First, we will create two new conversions, one is normal data migration, and the second is Just put a module that executes the sql script, and write the sql you need to run after the data migration. As shown in the figure:

img

Then click on the file in the upper left corner to create a new job. We only need to find Universal-START and Universal-Conversion in the core object, then drag a START and two conversions to the main panel, and then double-click the conversion, the first conversion selects the normal data Migration, the second conversion option is a conversion to execute sql script, as shown in the figure:

img

Then we can run this job. After the operation is successful, the data migration can be realized first, and the sql operation can be executed, as shown in the figure:

img

The above operation is my experience of using Kettle to migrate old data, which basically meets my needs, and I hope it can also help you.