How to create 10w test data and insert 10w different data in the database

How to create 10w test data and insert 10w different data in the database

Preface

Interview question: If you create 10w pieces of test data, how to insert 10w pieces of data into the database without duplicating the data

In recent interviews, SQL-related questions are often asked. Creating test data in the database is a scene that is often used in ordinary work. Generally, stress testing is performed. Performance testing also needs to prepare test data in the database. So how to generate a large amount of test data in batches?

Since python is often used, I thought of using python to generate sql first, and then execute sql to insert data into the database.

Language: python 3.6

Insert data

First of all, I want to insert the SQL statement, each id must not be repeated, the following is to execute a single insert statement

The INTO the INSERT apps. apiapp_card( id, card_id, card_user, add_time) The VALUES ( '. 1', '', 'test123', '2019-12-17');

10w is too much and the execution time is too long, use python to generate 1w first to measure the execution time.

1. we need to generate multiple inert statements, here I use python language to write paragraphs to generate sql text.

-Use %s to replace the field value that needs to be changed. If there are multiple values ​​that need to be changed, you can replace the corresponding value with multiple %s. The table I designed here can be inserted successfully as long as the id is different.

-Use a for loop, and add 1 to the id every time, so that the id will not be repeated, otherwise there will be duplicates that cannot be written successfully when inserting into the database.

-a is for additional writing

-Separate each sql with a semicolon

-Every time you write data, add/n line feed at the end

# python3
# Of: Shanghai - yo

for i in range(10000):
    a = "INSERT INTO `apps`.`apiapp_card` (`id`, `card_id`, `card_user`, `add_time`) VALUES ('%s','','test123', '2019-12-17' );"%str(i+1)
    with open("a.txt", "a") as fp:
        fp.write(a+"\n")

Execute the python code, generate an a.text file locally, open the generated data, the part is as follows

INSERT INTO `apps`.`apiapp_card` (`id`, `card_id`, `card_user`, `add_time`) VALUES ('1','','test123', '2019-12-17');
INSERT INTO `apps`.`apiapp_card` (`id`, `card_id`, `card_user`, `add_time`) VALUES ('2','','test123', '2019-12-17');
INSERT INTO `apps`.`apiapp_card` (`id`, `card_id`, `card_user`, `add_time`) VALUES ('3','','test123', '2019-12-17');
INSERT INTO `apps`.`apiapp_card` (`id`, `card_id`, `card_user`, `add_time`) VALUES ('4','','test123', '2019-12-17');
......
INSERT INTO `apps`.`apiapp_card` (`id`, `card_id`, `card_user`, `add_time`) VALUES ('10000','','test123', '2019-12-17');

If id is a mobile phone number, how to generate 10w different mobile phone numbers?

It can be generated according to the number segment starting with the first 3 digits of the mobile phone number, such as the number beginning with 186, first use the initial data 1860000000, and then add 1 to this number each time

Add it to 18600099999, so the number segment 1860000000-18600099999 is 10w mobile phone numbers.

After changing id to mobile phone number, modify the code as follows

# python3
# Of: Shanghai - yo

for i in range(10000):
    a = "INSERT INTO `apps`.`apiapp_card` (`id`, `card_id`, `card_user`, `add_time`) VALUES ('%s','','test123', '2019-12-17' );"%str(i+1860000000)
    with open("a.txt", "a") as fp:
        fp.write(a+"\n")

Just change str(i+1) to str(i+1860000000) based on the above to generate a mobile phone number

INSERT INTO `apps`.`apiapp_card` (`id`, `card_id`, `card_user`, `add_time`) VALUES ('1860000000','','test123', '2019-12-17');
INSERT INTO `apps`.`apiapp_card` (`id`, `card_id`, `card_user`, `add_time`) VALUES ('1860000001','','test123', '2019-12-17');
INSERT INTO `apps`.`apiapp_card` (`id`, `card_id`, `card_user`, `add_time`) VALUES ('1860000002','','test123', '2019-12-17');

Copy the generated text, and paste the SQL corresponding to multiple INSERT INTOs to the Navicat client for execution at one time

It took about 5 minutes to complete the execution, which means that it takes 50 minutes for 10w bars. This is too slow. If there are more data, it will take too long, which is not the effect we want!

Batch execution

Because it takes too long for a single execution, now it needs to be optimized to change to an insert statement, to insert data in batches, and only write one insert into to write to the database in batches at one time, which will be much faster.

You can splice SQL statements, use insert into table () values ​​(),(),(),() and then insert them all at once.

Either all of the batch execution is successful, or none of them will be written successfully. When there is a problem with the written SQL syntax, the write will not be successful.

Note:

-Splicing SQL, multiple values ​​are separated by English commas

-The value should correspond to the field of the data table one-to-one

-Be sure to note that the last piece of data is not a comma, change it to a number

# python3
# Of: Shanghai - yo

insert_sql = "INSERT INTO `apps`.`apiapp_card` VALUES "
with open("b.txt", "a") as fp:
        fp.write(insert_sql+"\n")
for i in range(10000):
    a = "('%s','','test123', '2019-12-17'),"%str(i+10001)
    with open("b.txt", "a") as fp:
        fp.write(a+"\n")

After the execution is complete, copy the content of the b.text file. It should be noted that this must be changed to; the end, otherwise a syntax error will be reported

Part of the data content is shown below

INSERT INTO `apps`.`apiapp_card` VALUES 
('10001','','test123', '2019-12-17'),
('10002','','test123', '2019-12-17'),
......
('20000','','test123', '2019-12-17');

Copy the generated INSERT INTO to the Navicat client for execution

After the execution was completed, the last test result showed that 1w pieces of data only took 0.217 seconds, and the speed was significantly improved.

10w data insertion

Next, how long will it take to generate 10 w pieces of data?

# Of: Shanghai - yo
# python3

insert_sql = "INSERT INTO `apps`.`apiapp_card` VALUES "
with open("b.txt", "a") as fp:
        fp.write(insert_sql+"\n")
for i in range(100000):
    a = "('%s','','test123', '2019-12-17'),"%str(i+100000)
    with open("b.txt", "a") as fp:
        fp.write(a+"\n")

The data generated after execution using the python script is as follows

INSERT INTO `apps`.`apiapp_card` VALUES 
('100000','','test123', '2019-12-17'),
('100001','','test123', '2019-12-17'),
......
('199999','','test123', '2019-12-17');

Insert mysql directly, there will be an error at this time: Err 1153-Got a packet bigger than'max_allowed_packet' bytes

The reason for the error: due to the large amount of data, mysql will limit the SQL with a large amount of data in a single table, and the string of 10w pieces of data exceeds max_allowed_packet

The allowable range.

Solution: need to modify the max_allowed_packet value of the mysql database to a larger value

max_allowed_packet

First enter the command in Navicat to view the max_allowed_packet maximum allowed packet

show global variables like'max_allowed_packet';

It is seen that the value value is 4194304, and the maximum limit is 40 M. The SQL string we only need is too large to exceed this range.

In the Navicat client, we cannot directly modify the corresponding value. You need to log in to mysql and modify it with the command line.

My mysql here is built on docker, need advanced container, log in to mysql.

The operation steps are as follows:

  • docker exec into the docker container
  • mysql -uroot -p enter the password and log in to mysql
  • set global max_allowed_packet=419430400; Set the maximum allowable packet 400M
  • show global variables like'max_allowed_packet'; Check whether the previous settings are in effect
[root@VM_0_2_centos ~]# docker exec -it 934b30a6dc36/bin/bash
root@934b30a6dc36:/# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor. Commands end with; or/g.
Your MySQL connection id is 303822
Server version: 5.7.27 MySQL Community Server (GPL)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type'help;' or'\h' for help. Type'\c' to clear the current input statement.

mysql> show global variables like'max_allowed_packet';
+--------------------+-----------+
| Variable_name | Value |
+--------------------+-----------+
| max_allowed_packet | 4194304 |
+--------------------+-----------+
1 row in set (0.00 sec)

mysql> set global max_allowed_packet=419430400;
Query OK, 0 rows affected (0.00 sec)

mysql> show global variables like'max_allowed_packet';
+--------------------+-----------+
| Variable_name | Value |
+--------------------+-----------+
| max_allowed_packet | 419430400 |
+--------------------+-----------+
1 row in set (0.00 sec)

mysql> 

As you can see from the above query results, it has taken effect.

Re-execute the above 10w pieces of data again, and it takes about 11 seconds to view the running results.

Affected rows: 100000

Time: 11.678s

The above method can only take effect temporarily. When you restart mysql, you will find that it is restored again.

There is also a permanent method, you need to modify the my.cnf configuration file

 Add a sentence in the [mysqld] section, and modify the corresponding value if there is one:  

 -max_allowed_packet=40M

The value here can be used in M ​​units. After modification, you need to restart mysql to take effect.

Use python execution

How long will it take to execute directly with python without using the Navicat client?

First encapsulate the method of connecting to mysql, and then splice the executed sql statements. When splicing, pay attention to the last character and change it;

Get the current timestamp before executing the code, and get the timestamp again after the code is executed. The time interval between the two is the execution time, and the time unit is s

The python execution mysql code reference is as follows

import pymysql
'''
# python3
Author: Shanghai - yo
pip install PyMySQL==0.9.3
'''

dbinfo = {
    "host": "192.168.1.x",
    "user": "root",
    "password": "123456",
    "port": 3306}


class DbConnect():
    def __init__(self, db_cof, database=""):
        self.db_cof = db_cof
        # Open database connection
        self.db = pymysql.connect(database=database,
                                  cursorclass=pymysql.cursors.DictCursor,
                                  **db_cof)

        # Use cursor() method to get operation cursor
        self.cursor = self.db.cursor()

    def select(self, sql):
        # SQL query
        # sql = "SELECT * FROM EMPLOYEE/
        # WHERE INCOME> %s"% (1000)
        self.cursor.execute(sql)
        results = self.cursor.fetchall()
        return results

    def execute(self, sql):
        # SQL delete, submit, modify statements
        # sql = "DELETE FROM EMPLOYEE WHERE AGE> %s"% (20)
        try:
           # Execute SQL statement
           self.cursor.execute(sql)
           # Submit changes
           self.db.commit()
        except:
           # Roll back when an error occurs
           self.db.rollback()

    def close(self):
        # Close connection
        self.db.close()


if __name__ =='__main__':
    import time
    insert_sql = "INSERT INTO `apps`.`apiapp_card` VALUES "
    insert_values ​​= "".join(["('%s','','test123', '2019-12-17'),/n"%str(i+100000) for i in range(100000)])
    # Splicing sql
    sql = insert_sql + insert_values[:-3]+";"
    # print(sql)
    # Execute sql
    time1 = time.time()
    db = DbConnect(dbinfo, database="apps")
    db.execute(sql)
    db.close()
    time2 = time.time()
    print("Total time-consuming: %s"% (time2-time1))

Use python to execute the result:, the 总过耗时:1.0816256999969482result is beyond my imagination, 10w pieces of data actually only takes 1 second!

Reference: https://cloud.tencent.com/developer/article/1643291 How to create 10w test data and insert 10w different data into the database-Cloud + Community-Tencent Cloud