Weibo (APP) list crawler and data visualization

Weibo (APP) list crawler and data visualization

Preface

Continue to APP crawler today, today is the data of Weibo list (24-hour list), the collected fields are:

  • User id
  • User area
  • User gender
  • User fans
  • Weibo content
  • release time
  • Reposts, comments, and likes

The article is divided into the following content:

  • Crawler code
  • User analysis
  • Weibo analysis

Crawler code

import requests
import json
import re
import time
import csv

headers = {
    'Host':'api.weibo.cn',
    'Connection':'keep-alive',
    'User-Agent':'Weibo/29278 (iPhone; iOS 11.4.1; Scale/2.00)'
}

f = open('1.csv','w+',encoding='utf-8',newline='')
writer = csv.writer(f)
writer.writerow(['user_id','user_location','user_gender','user_follower','text','created_time','reposts_count','comments_count','attitudes_count'])

def get_info(url):
    res = requests.get(url,headers=headers)
    print(url)
    datas = re.findall('"mblog":(.*?),"weibo_position"',res.text,re.S)
    for data in datas:
        json_data = json.loads(data+')')
        user_id = json_data['user']['name']
        user_location = json_data['user']['location']
        user_gender = json_data['user']['gender']
        user_follower = json_data['user']['followers_count']
        text = json_data['text']
        created_time = json_data['created_at']
        reposts_count = json_data['reposts_count']
        comments_count = json_data['comments_count']
        attitudes_count = json_data['attitudes_count']
        print(user_id,user_location,user_gender,user_follower,text,created_time,reposts_count,comments_count,attitudes_count)
        writer.writerow([user_id,user_location,user_gender,user_follower,text,created_time,reposts_count,comments_count,attitudes_count])
    time.sleep(5)

if __name__ =='__main__':
    urls = [ 'https://api.weibo.cn/2/cardlist?gsid=_2A252dh7LDeRxGeNM41oV-S_MzDSIHXVTIhUDrDV6PUJbkdANLVTwkWpNSf8_0j6hqTyDS0clYi-pzwDc2Kd8oj_d&wm=3333_2001&i=b9f7194&b=0&from=1088193010&c=iphone&networktype=wifi&v_p=63&skin=default&v_f=1&s=ef8eeeee&lang=zh_CN&sflag=1&ua= iPhone8,1__weibo__8.8.1__iphone__os11.4.1 & ft = 11 & aid = 01AuxGxLabPA7Vzz8ZXBUpkeJqWbJ1woycR3lFBdLhoxgQC1I. & moduleID = pagecard & scenes = 0 & uicode = 10000327 & luicode = 10000010 & count = 20 & extparam = discover & containerid = 102803_ctg1_8999 _-_ ctg1_8999_home & fid = 102803_ctg1_8999 _-_ ctg1_8999_home & lfid = 231091 & page = {} '. format (str (i)) for i in range(1,16)]
    for url in urls:
        get_info(url)

User analysis

1. visualize some user IDs. The larger font is listed twice (the most listed in this statistics is 2 times).

Then carry out data processing and statistics on the area. It can be seen that the users located in Beijing are the most (big V are all in Beijing).

df['location'] = df['user_location'].str.split('').str[0]

Next, let's look at the gender ratio of users: male users account for more.

Finally, let’s take a look at the top ten big V fans on the list:

Weibo analysis

1. the time data is processed, and the hour time period is taken out.

Next, let’s take a look at the top ten users on Weibo who liked it.

Finally, draw a word cloud diagram of Weibo articles.

Reference: https://cloud.tencent.com/developer/article/1197120 Weibo (APP) list crawler and data visualization-Cloud + Community-Tencent Cloud