Continue to APP crawler today, today is the data of Weibo list (24-hour list), the collected fields are:
The article is divided into the following content:
import requests import json import re import time import csv headers = { 'Host':'api.weibo.cn', 'Connection':'keep-alive', 'User-Agent':'Weibo/29278 (iPhone; iOS 11.4.1; Scale/2.00)' } f = open('1.csv','w+',encoding='utf-8',newline='') writer = csv.writer(f) writer.writerow(['user_id','user_location','user_gender','user_follower','text','created_time','reposts_count','comments_count','attitudes_count']) def get_info(url): res = requests.get(url,headers=headers) print(url) datas = re.findall('"mblog":(.*?),"weibo_position"',res.text,re.S) for data in datas: json_data = json.loads(data+')') user_id = json_data['user']['name'] user_location = json_data['user']['location'] user_gender = json_data['user']['gender'] user_follower = json_data['user']['followers_count'] text = json_data['text'] created_time = json_data['created_at'] reposts_count = json_data['reposts_count'] comments_count = json_data['comments_count'] attitudes_count = json_data['attitudes_count'] print(user_id,user_location,user_gender,user_follower,text,created_time,reposts_count,comments_count,attitudes_count) writer.writerow([user_id,user_location,user_gender,user_follower,text,created_time,reposts_count,comments_count,attitudes_count]) time.sleep(5) if __name__ =='__main__': urls = [ 'https://api.weibo.cn/2/cardlist?gsid=_2A252dh7LDeRxGeNM41oV-S_MzDSIHXVTIhUDrDV6PUJbkdANLVTwkWpNSf8_0j6hqTyDS0clYi-pzwDc2Kd8oj_d&wm=3333_2001&i=b9f7194&b=0&from=1088193010&c=iphone&networktype=wifi&v_p=63&skin=default&v_f=1&s=ef8eeeee&lang=zh_CN&sflag=1&ua= iPhone8,1__weibo__8.8.1__iphone__os11.4.1 & ft = 11 & aid = 01AuxGxLabPA7Vzz8ZXBUpkeJqWbJ1woycR3lFBdLhoxgQC1I. & moduleID = pagecard & scenes = 0 & uicode = 10000327 & luicode = 10000010 & count = 20 & extparam = discover & containerid = 102803_ctg1_8999 _-_ ctg1_8999_home & fid = 102803_ctg1_8999 _-_ ctg1_8999_home & lfid = 231091 & page = {} '. format (str (i)) for i in range(1,16)] for url in urls: get_info(url)
1. visualize some user IDs. The larger font is listed twice (the most listed in this statistics is 2 times).
Then carry out data processing and statistics on the area. It can be seen that the users located in Beijing are the most (big V are all in Beijing).
df['location'] = df['user_location'].str.split('').str[0]
Next, let's look at the gender ratio of users: male users account for more.
Finally, let’s take a look at the top ten big V fans on the list:
1. the time data is processed, and the hour time period is taken out.
Next, let’s take a look at the top ten users on Weibo who liked it.
Finally, draw a word cloud diagram of Weibo articles.