Four ways to import large files into MySQL (insert into, mysqlimport, load data, Pandas)

Note: For learning purposes only, please do not use for illegal purposes, if there is infringement, please contact the blogger to delete
Author: zhu6201976
1. Demand
All large csv files (above 100M) in the bulk storage directory
Second, analysis of implementation methods
1. Insert into reads files line by line, encapsulates sql, and executes sql. The speed is extremely slow, and the database and table structure need to be created in advance;
2. The simple package of load data by mysqlimport is fast, and the database and table structure need to be created in advance;
3. Load data MySQL provides a high-speed storage method for large files, which requires pre-creation of the database and table structure; (recommended)
4. Pandas provides DataFrame --> table mapping, which can realize the rapid storage of large files, and the database needs to be created in advance; (recommended)

Precautions for creating databases and tables:

1. If the text contains special characters, such as emoticons, set the database character set to utf8mb4

2. In order to facilitate insertion, the primary key is not set in the warehousing link. After the warehousing is completed, the demand is set, and the table structure field type is uniformly set to large text Text

Three, load data

Official document: https://dev.mysql.com/doc/refman/8.0/en/load-data.html

grammar:

Please refer to the document for the meaning of the parameters, so I won't repeat them here.

Sample code:

from pymysql import connect  class LoadData(object):    def __init__(self):        self.host = 'localhost'        self.port = 3006        self.user = 'root'        self.password = 'root'        self.database = 'test'        self.conn = connect(host=self.host, port=self.port, user=self.user, password=self.password,                            database=self.database, local_infile=True, charset='utf8')        self.cursor = self.conn.cursor()     def run(self):        file_path = r'C:\Users\Tesla\Desktop\ratings.csv'        table_name = 'ratings'         self.conn.ping()        load_data_sql = f"load data concurrent local infile '{file_path}' replace into table {table_name} character set utf8 fields terminated by ',' enclosed by '\"' escaped by '' lines terminated by '\r\n' ignore 1 lines;"        count = self.cursor.execute(load_data_sql)        self.conn.commit()  load_data = LoadData()load_data.run()
Four, Pandas

Official document: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html?highlight=to_sql#pandas.DataFrame.to_sql

Sample code:

Inbound results:

Precautions:

1. Dependent environment sqlalchemy pymysql

pip install sqlalchemy

pip install PyMySQL

2. If you do not specify the engine type when connecting to the database, the default is MySQLdb, which is not supported by Python3. If mysqlconnector is specified, the large file in the database will be interrupted and an error will be reported. Here, use pymysql to solve it.