Recently, the number of fans has risen rapidly and has exceeded the 3,500 mark. Although it is very small compared to the big coffee, everyone who has read my short book list knows that 3,500 fans can also be ranked in the top 200 (although only crawled More than 200,000 data). However, with the growth of fans, I am also worried about the quality of fans. I found that many fans did not publish any text. I defined these users as inactive users (this is too extreme). Today, I am the author’s fans and Comparing the fans of seniors running to the right , let’s look at the gap between the enemy and ourselves~
The reason for the short book is that only the first 100 pages of fans can be crawled here, one page is 9 fans, and a total of only 900 fans can be crawled. The crawled fields are also very simple:
import requests from lxml import etree import pymongo client = pymongo.MongoClient('localhost', 27017) jianshu = client['jianshu'] luopan = jianshu['luopan'] xiangyou = jianshu['xiangyou'] urls = ['http://www.jianshu.com/users/54b5900965ea/followers?page={}'.format(str(i)) for i in range(1,101)] for url in urls: html = requests.get(url) selector = etree.HTML(html.text) infos = selector.xpath('//ul[@class="user-list"]/li') if len(infos)> 0: for info in infos: id = info.xpath('div/a/text()')[0] topic = info.xpath('div/div[1]/span[1]/text()')[0].strip('Follow') fans = info.xpath('div/div[1]/span[2]/text()')[0].strip('fans') article = info.xpath('div/div[1]/span[3]/text()')[0].strip('article') content = { 'id':id, 'topic':topic, 'fans':fans, 'article':article } # print(id,topic,fans,article) xiangyou.insert_one(content) else: break
This part is visualized by python data analysis and pyecharts library.
The fan quality of the running to the right senior is obviously much higher than that of me. They will also interact with some big coffees. When will big coffees also interact with me~ Since only the first 900 fans can be crawled, the difference will be expanded many times.
This gap is not very big. This is also a problem that Jianshu has always been. A large number of users do not write articles. Jianshu allows us to write our lives simply, and everyone can write more articles, whether it is study or life. , Work, there is always your excitement~