For the groupby aggregation function of the dataframe, we properly understand the syntactic sugar, which will have a multiplier effect on data analysis.
First look at the type of each field
import numpy as np import pandas as pd import pymysql conn = pymysql.connect(host='localhost', user='root', passwd='123456', db='test', port=3306, charset='utf8') jianshu = pd.read_sql('select * from jianshu1',conn) jianshu.dtypes
It can be seen that these fields of view are integer type data, but here is object data, so we need to modify the data type, take view as an example.
jianshu['view'] = jianshu['view'].astype('int64') jianshu.dtypes
Use the user column as an index.
jianshu.set_index('user',inplace=True) jianshu
The grouped data type is a groupby object, which can be iterated.
jianshu.groupby(jianshu.index)
for name,group in jianshu.groupby(jianshu.index): print(name,group)
jianshu.groupby(jianshu.index)[['view']].sum()
Syntactic sugar 2: data aggregation through aggregate or agg method
jianshu.groupby(jianshu.index)[['view']].agg(['mean','sum'])