Weibo (APP) list crawler and data visualization

Weibo (APP) list crawler and data visualization


Continue to APP crawler today, today is the data of Weibo list (24-hour list), the collected fields are:

  • User id
  • User area
  • User gender
  • User fans
  • Weibo content
  • release time
  • Reposts, comments, and likes

The article is divided into the following content:

  • Crawler code
  • User analysis
  • Weibo analysis

Crawler code

import requests
import json
import re
import time
import csv

headers = {
    'User-Agent':'Weibo/29278 (iPhone; iOS 11.4.1; Scale/2.00)'

f = open('1.csv','w+',encoding='utf-8',newline='')
writer = csv.writer(f)

def get_info(url):
    res = requests.get(url,headers=headers)
    datas = re.findall('"mblog":(.*?),"weibo_position"',res.text,re.S)
    for data in datas:
        json_data = json.loads(data+')')
        user_id = json_data['user']['name']
        user_location = json_data['user']['location']
        user_gender = json_data['user']['gender']
        user_follower = json_data['user']['followers_count']
        text = json_data['text']
        created_time = json_data['created_at']
        reposts_count = json_data['reposts_count']
        comments_count = json_data['comments_count']
        attitudes_count = json_data['attitudes_count']

if __name__ =='__main__':
    urls = [ ' iPhone8,1__weibo__8.8.1__iphone__os11.4.1 & ft = 11 & aid = 01AuxGxLabPA7Vzz8ZXBUpkeJqWbJ1woycR3lFBdLhoxgQC1I. & moduleID = pagecard & scenes = 0 & uicode = 10000327 & luicode = 10000010 & count = 20 & extparam = discover & containerid = 102803_ctg1_8999 _-_ ctg1_8999_home & fid = 102803_ctg1_8999 _-_ ctg1_8999_home & lfid = 231091 & page = {} '. format (str (i)) for i in range(1,16)]
    for url in urls:

User analysis

1. visualize some user IDs. The larger font is listed twice (the most listed in this statistics is 2 times).

Then carry out data processing and statistics on the area. It can be seen that the users located in Beijing are the most (big V are all in Beijing).

df['location'] = df['user_location'].str.split('').str[0]

Next, let's look at the gender ratio of users: male users account for more.

Finally, let’s take a look at the top ten big V fans on the list:

Weibo analysis

1. the time data is processed, and the hour time period is taken out.

Next, let’s take a look at the top ten users on Weibo who liked it.

Finally, draw a word cloud diagram of Weibo articles.

Reference: Weibo (APP) list crawler and data visualization-Cloud + Community-Tencent Cloud