Practical application 1: Analyze the order details table data of a catering company (visual analysis of python data based on matplotlib)

table of Contents

1. Prepare in advance

1. Preparation of python third-party libraries

2. Preparation of SQL file

2. Project background

Three, data preprocessing

1. Import related libraries

2. Build a channel

3. File reading and merging

4. Preprocessing of related information

5. Preprocess the complete code

Fourth, the visualization of sales changes over time

1. Calculate the date and time of the minimum sales

2. Visualization of sales changes over time

3. Data analysis of sales changes over time

Five, the relationship between the week and sales

1. Group aggregation

2. Sort by week

3. Column chart drawing

4. Pie chart drawing

5. Data analysis of the relationship between week and sales

6. Analysis of the relationship between time, sales and order volume

1. Packet aggregation processing

2. Bubble chart drawing

3. Data analysis of the relationship between time, sales and order volume

Seven, project summary

1. Prepare in advance

1. Preparation of python third-party libraries

  • pandas
  • numpy
  • sqlalchemy
  • pymysql
  • matplotlib

2. Preparation of SQL file

  • meal_order_detail1
  • meal_order_detail2
  • meal_order_detail3

For partners who want to learn python data analysis through this project, here is the download link of the above three SQL files (free download) : https://download.csdn.net/download/weixin_44940488/19442022

2. Project background

Taking the order detail table data of a catering company as a visualization object, analyze the following content (see the table below for specific information):

1. Changes in sales over time;

2. The relationship between sales and the week;

3. The relationship between the number of orders and sales.

among them:

order_id : order number

dishes_name : dish name

counts : the number of dishes

Amounts : the unit price of the dish

place_order_time : order time

As can be seen from the order details table above, it will have more columns. We only focus on the five columns of data here. They are order_id : order number, dishes_name : dish name, counts : dish quantity, amounts : dish Unit price, place_order_time : time when the order is placed . Other data is not useful in analysis, so don't pay too much attention to it.

Three, data preprocessing

First, we need to use the pandas library to read the database. Then, when using read_sql in pandas to read the database, a function is needed to build a connection channel between our database and python, and the sqlalchemy library is needed here . The specific code is as follows:

import pandas as pd     # 使用pandas库进行数据库的读取from sqlalchemy import create_engine   # 导入连接数据库和python的函数:create_engine

2. Build a channel

Use the create_engine() function in the sqlalchemy library to build a data channel between mysql and python. Here you need to use the pymysql library (install this library in advance, otherwise the following programs cannot proceed). The user name of our database is: root; password: 123456; the IP address and port number of the database: 127.0.0.1:3306; the database: test , use utf8 for file reading.

The specific code is as follows:

# 建立连接con = create_engine('mysql+pymysql://root:[email protected]:3306/test?charset=utf8')

3. File reading and merging

Here we will use a function read_sql() mentioned earlier to read our SQL file. After reading the three SQL files, we need to merge the three tables. Here we need to use the concat() function in the pandas library . Because these three tables are all orders data in August, it is only divided according to time. So we need to merge these three SQL files into one SQL file.

# 读取数据库中的SQL文件data1 = pd.read_sql('meal_order_detail1', con=con)data2 = pd.read_sql('meal_order_detail2', con=con)data3 = pd.read_sql('meal_order_detail3', con=con) # 合并读取的三个SQL文件后删除原来的数据data = pd.concat([data1, data2, data3], axis=0)    # axis=0是纵向拼接,axis=1是横向拼接del data1, data2, data3 # 打印数据的列标签data.columns
# 计算收入data['price'] = data['counts'] * data['amounts'] # 订餐日期与星期相对应ind = pd.DatetimeIndex(data['place_order_time'])data['weekday_name'] = ind.weekday_namedata['day'] = pd.DatetimeIndex(data['place_order_time']).day print(data['price'])

5. Preprocess the complete code

import pandas as pd     # 使用pandas库进行数据库的读取from sqlalchemy import create_engine   # 导入连接数据库和python的函数:create_engine # 建立连接con = create_engine('mysql+pymysql://root:[email protected]:3306/test?charset=utf8') # 读取数据库中的SQL文件data1 = pd.read_sql('meal_order_detail1', con=con)data2 = pd.read_sql('meal_order_detail2', con=con)data3 = pd.read_sql('meal_order_detail3', con=con) # 合并读取的三个SQL文件后删除原来的数据data = pd.concat([data1, data2, data3], axis=0)    # axis=0是纵向拼接,axis=1是横向拼接del data1, data2, data3 # 打印数据的列标签# data.columns # 计算收入data['price'] = data['counts'] * data['amounts'] # 订餐日期与星期相对应ind = pd.DatetimeIndex(data['place_order_time'])data['weekday_name'] = ind.weekday_namedata['day'] = pd.DatetimeIndex(data['place_order_time']).day # print(data['price'])

Fourth, the visualization of sales changes over time

To view the recent situation of core indicators, you can use a line chart to reflect the data trend , and a scatter chart to reflect the data distribution .

1. Calculate the date and time of the minimum sales

import numpy as npdata_gb = data[['day', 'price']].groupby(by='day')number = data_gb.agg(np.sum) number['price'].argmin()

2. Visualization of sales changes over time

# 销售额随时间变化的可视化import matplotlib.pyplot as plt plt.figure(figsize=(10, 7))   # 设置绘图窗口 plt.rcParams['font.sans-serif'] = 'SimHei'    # 中文字体 plt.scatter(range(1, 32), number, marker='D')plt.plot(range(1, 32), number)plt.title('2016年8月餐饮销售额趋势示意图')plt.xlabel('日期')plt.ylabel('销售额')plt.xticks(range(1, 32)[::7], range(1, 32)[::7]) plt.text(number['price'].argmin(), number['price'].min(), '最小值为'+str(number['price'].min())) plt.show()

3. Data analysis of sales changes over time

From the chart shown above, we can see that there is an obvious cyclicality in catering sales. Sales on Saturdays and Sundays are often the highest in a week, and sales on weekdays are lower, which is also in line with the general leisure law. The lowest sales of the entire month occurred on the 16th.

Five, the relationship between the week and sales

From the line chart above, it can be seen that there is an obvious periodicity in catering sales. So, now let’s make an analysis of the week’s sales.

1. Group aggregation

# 分组聚合import numpy as npdata_gb = data[['weekday_name', 'price']].groupby(by='weekday_name')number = data_gb.agg(np.sum)
number

2. Sort by week

# 排序操作index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']number2 = number.loc[index, 'price']

3. Column chart drawing

import matplotlib.pyplot as plt plt.bar(range(1, len(number2)+1), number2, width=0.5, alpha=0.5)plt.xticks(range(1, len(number2)+1), number2.index)plt.title('星期与销售额的数量情况')for i, j in zip(range(1, len(number2)+1), number2):    plt.text(i, j, '%i'%j, ha='center', va='bottom')plt.show()

4. Pie chart drawing

import matplotlib.pyplot as plt plt.figure(figsize=(5, 5)) plt.style.use('ggplot')plt.pie(number2, labels=number2.index, autopct='%.2f %%', wedgeprops=dict(width=0.6, edgecolor='w')) plt.title('星期销售额占比情况')plt.show()

5. Data analysis of the relationship between week and sales

From the above chart, we can see that the sales of orders on Saturdays and Sundays accounted for more than 60%, and the sales on Tuesdays accounted for at least 6.9%. Tuesday and Thursday are our two lowest sales days.

6. Analysis of the relationship between time, sales and order volume

1. Packet aggregation processing

data_gb = data[['order_id', 'price', 'day']].groupby(by='day')def myfun(data):    return len(np.unique(data)) number = data_gb.agg({'price': np.sum, 'order_id': myfun})

2. Bubble chart drawing

import matplotlib.pyplot as plt plt.figure(figsize=(10, 6)) plt.scatter(range(1, 32), number['price'], s=number['order_id']) plt.title('订单量、销售额与时间的关系')plt.xlabel('时间')plt.ylabel('销售额')plt.show()

3. Data analysis of the relationship between time, sales and order volume

As can be seen from the above picture, the time is Saturday and Sunday when the sales and orders are the largest, and the maximum sales can exceed 40,000 yuan; and the more orders, the larger the bubble (red dot) in the picture , you can see Out is still the largest number of orders on Saturdays and Sundays.

Seven, project summary

The data of this project comes directly from the SQL file of the database, so we need to read the file from the database before we can start preprocessing and data visualization analysis. In actual work, we are not only exposed to Excel files, but also database files ( SQL files ).

Through the exercises of this project, I learned to establish a connection between the database and python through the sqlalchemy library and the pymysql library, and then use the pd.read_sql() function to read the database file, and use the  pd.concat() function to merge the database. The SQL data files are preprocessed through the pandas library and the numpy library, the matplotlib library is used for data visualization, and finally the data analysis is performed through the visualized chart.

The data analysis of this project is not very good, and it needs to be improved and optimized in the follow-up study and thinking. If there are errors or areas that can be improved, everyone is welcome to point out.