Mathematical modeling is over. The goal at the beginning is not to stay up late, but I still stayed up late (QAQ). After a day's delay, I started to write a short book. I feel like I haven't crawled for a long time. Today, I will crawl down the Weibo friends circle information on the mobile terminal.
import requests import json headers = { 'Cookie':'xxxxxxxx', 'User_Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' } f = open('C:/Users/LP/Desktop/weibo.txt','a+',encoding='utf-8') def get_info(url,page): html = requests.get(url,headers=headers) json_data = json.loads(html.text) card_groups = json_data[0]['card_group'] for card_group in card_groups: f.write(card_group['mblog']['text'].split('')[0]+'\n') next_cursor = json_data[0]['next_cursor'] if page<50: next_url ='https://m.weibo.cn/index/friends?format=cards&next_cursor='+str(next_cursor)+'&page=1' page = page + 1 get_info(next_url,page) else: pass f.close() if __name__ =='__main__': url ='https://m.weibo.cn/index/friends?format=cards' get_info(url,1)
Look at the returned data, this next_cursor is very important! ! ! !
Scroll down, as shown in the figure, you can see that there is also next_cursor in the url of the second page, which happens to be the return of the first page! ! ! ! !
In the end, jieba segmentation made a word cloud, except for Erha, it feels all belonged to the group leader and was screened.