Python crawler Weibo friends circle code code analysis word cloud

Python crawler Weibo friends circle code code analysis word cloud

Mathematical modeling is over. The goal at the beginning is not to stay up late, but I still stayed up late (QAQ). After a day's delay, I started to write a short book. I feel like I haven't crawled for a long time. Today, I will crawl down the Weibo friends circle information on the mobile terminal.

Code

import requests
import json

headers = {
    'Cookie':'xxxxxxxx',
    'User_Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
}

f = open('C:/Users/LP/Desktop/weibo.txt','a+',encoding='utf-8')

def get_info(url,page):
    html = requests.get(url,headers=headers)
    json_data = json.loads(html.text)
    card_groups = json_data[0]['card_group']
    for card_group in card_groups:
        f.write(card_group['mblog']['text'].split('')[0]+'\n')

    next_cursor = json_data[0]['next_cursor']

    if page<50:
        next_url ='https://m.weibo.cn/index/friends?format=cards&next_cursor='+str(next_cursor)+'&page=1'
        page = page + 1
        get_info(next_url,page)
    else:
        pass
        f.close()

if __name__ =='__main__':
    url ='https://m.weibo.cn/index/friends?format=cards'
    get_info(url,1)

Code analysis

  1. Submit cookie to simulate login on Weibo
  2. The information in the circle of friends is also loaded asynchronously, as shown in the url of the first page

Look at the returned data, this next_cursor is very important! ! ! !

Scroll down, as shown in the figure, you can see that there is also next_cursor in the url of the second page, which happens to be the return of the first page! ! ! ! !

  1. In this way, you can construct multi-page URLs and crawl data

Word cloud

In the end, jieba segmentation made a word cloud, except for Erha, it feels all belonged to the group leader and was screened.

Reference: https://cloud.tencent.com/developer/article/1155573 Python crawler Weibo friends circle code code analysis word cloud-cloud + community-Tencent Cloud