Who says you must be a programmer if you learn programming? Introduce myself, I am neither a programmer, nor an IT major in university. The main content of my work is to produce and report various statistical reports, data graphs, etc. Colleagues around me spend at least 3.5 to 4 hours a day working on Excel, using various functions such as pivot charts and vlookup to make a daily report with more than 3000 data.
umm... If the leader arranges any work temporarily, it will have to work overtime to find what the leader wants from the vast amount of data.
We have a huge Excel file of more than 70M. It takes 20 minutes to open the broken computer issued by the company. It contains detailed information of each car we have purchased in the past ten years. The focus is on it. The data format is still chaotic. For example, if we have a supplier whose full name is ABCD Co., Ltd., then this Excel will contain AB, AB, CD, CD, ACBD, ABD, etc. And other similar names. When I first saw this form, my heart broke down.
Later, my colleague told me that she used Excel's vlookup function, and when it didn’t match, she searched manually...
In this way, after experiencing nightmarish months, I saw the word "Turing complete". A Turing complete language can theoretically solve any algorithm. Obviously, Python is a Turing complete language.
My first step is to start with data cleaning. First, I exported the 70M Excel I just mentioned into csv, removing unnecessary formats, styles, objects, etc.
Then I want to unify the standard names of the various data. As I just said, because there are too many people handling the same data, how to express the same data, the amount of data of hundreds of thousands, it is obviously unrealistic to cross-check manually. (Later I learned about the term Descartes product)
In the first step, I used Excel's own deduplication function to deduplicate all supplier names, leaving about 5000 names.
Then, as a lazy person, I would definitely not use manual comparisons.
Python has a library called fuzzywuzzy, which is used for fuzzy matching.
A number from 0 to 100 will be output. The larger the number, the more similar the two contents are.
So I wrote a loop to sift out the data sets with a matching degree of more than 70% among these more than 5000 items, and then manually replace them in Excel (no way, I wouldn't be able to operate Excel at that time)
Then I sorted out the positions of the various ranks, so that I have a relatively accurate data source.
After thinking about it, storing the data in Excel is not a long-term solution. After all, with hundreds of thousands of data, I want to import the data into the database. When I was in college, I took the second-level computer access test (of course, although I did, but I didn't test it...), I wanted to import Access, but I didn't understand the Python library, and I just couldn't read it, so I got stuck.
Later, I thought that in the one-click terminal of the XX private server I downloaded from the Internet, there was a one-click startup method for the MySQL database. I tried it and it worked. I copied the entire MySQL program and cleaned it up. Database. (At that time, I wanted to learn how to configure MySQL by myself, but I didn’t study to understand... This database is the predecessor of the MySQL database we provide now. Of course, we are now providing a new non-toxic ship version that is re-made)
Then use Navicat to import Excel and use the pymysql library to read. In this way, we are also the ones who take advantage of the database~
The "Rich Dad and Poor Dad" written by Robert Qingqi put forward a concept that when you achieve financial freedom, your wealth accumulation will embark on the fast lane of a virtuous circle, and your passivity will be more and more easily increased. income.
In the same way, when you realize part of the work automation, you will embark on a fast lane of "work freedom", because you have more time to study how to complete the rest of the work more efficiently, and then , The daily working time will be shorter and shorter.
Why do I have so many parentheses in the above that I can’t figure it out? Because there is no time!
Now that I am fine, I have cut half of my daily working hours, so I can think about how to solve all kinds of technical problems.
I learned various libraries of Excel, such as xlrd, xlwt, xlsxwriter, openpyxl, xlwings, win32com, and I can choose the libraries I need according to my needs; I also learned python-docx and python-pptx that belong to the Office series, and started to try Automatically produce Word and PPT; learned the stmp library and zmail library for automatic mailing; learned the itchat that can operate WeChat to send messages; learned crawling, learned to understand compression, learned to adjust image size, learned to convert Word to pdf, etc. Waiting for a lot of libraries or functions.
With learning, the daily working hours are getting shorter and shorter...
Now, my daily work only takes ten minutes, and it can be completed by recording four minutes at the earliest.
Go to work every day, the first thing is to run a py file gracefully, behind a bunch of imports, only
After finishing the daily work, it is the process of printing, hand signing, and sending to the leaders for review.
A conservative, traditional, state-owned enterprise that will not make waves for decades, I was forced to take an IT job, and a promotion and salary increase have been achieved.
Now, I can finish my daily work very quickly, and then there is a lot of time to learn and think.
Now, I have moved the database to an obsolete office computer, and used another hot backup as a public networked database;
I have written a data output software with gui using pyqt, all colleagues can freely output data for analysis;
I have used pandas and matplotlib to write a software that automatically generates data analysis pictures, and then make monthly reports, weekly reports and other report materials, no longer need to manually work the graphs through Excel;
I have used pandas, matplotlib and pyqt to write a database analysis software. After you enter the time period and qualifications, you can perform data analysis with one click. No colleagues will always ask me for data;
I have already written a contract generator with xlrd and python-docx, and colleagues who purchase during peak seasons do not need to change the contract in a desperate amount;
I have used numpy and pyqt to make a purchase, sales and inventory balance software, which allows leaders to display the inventory situation under different purchase and sales situations in real time by dragging the slider when making decisions;
I have written a real-time grain price acquisition database with a crawler and pymysql, bidding farewell to the way of browsing the web to copy prices line by line;
So in your future, will you continue to do repetitive tasks every day, or will you liberate yourself from work and embrace the larger world?
I still want to recommend the Python automated learning group I built myself: 1033482984 , the group is all learning Python automated testing, if you are learning Python, the editor welcomes you to join, everyone is a test and development party, sharing dry goods from time to time (Only related to Python software testing), including a 2021 latest Python automation data compiled by myself.
Finally: [may help you]
This information should be the most comprehensive and complete preparation warehouse for friends of [software testing]. This warehouse has accompanied tens of thousands of test engineers through the most difficult journey, and I hope it can help you
Follow me on WeChat Official account [ software test dao ] free to get~
If my blog is helpful to you, if you like the content of my blog, please "Like", "Comment" and "Favorite" with one click and three links!