A brief book on Python data analysis, fans, friend and friend gap, crawler analysis and code friend and friend gap

A brief book on Python data analysis, fans, friend and friend gap, crawler analysis and code friend and friend gap

Recently, the number of fans has risen rapidly and has exceeded the 3,500 mark. Although it is very small compared to the big coffee, everyone who has read my short book list knows that 3,500 fans can also be ranked in the top 200 (although only crawled More than 200,000 data). However, with the growth of fans, I am also worried about the quality of fans. I found that many fans did not publish any text. I defined these users as inactive users (this is too extreme). Today, I am the author’s fans and Comparing the fans of seniors running to the right , let’s look at the gap between the enemy and ourselves~

Crawler analysis and code

The reason for the short book is that only the first 100 pages of fans can be crawled here, one page is 9 fans, and a total of only 900 fans can be crawled. The crawled fields are also very simple:

  • Fan id
  • Attention
  • Fans
  • Number of articles (here I define those who have not written articles as inactive users)
import requests
from lxml import etree
import pymongo

client = pymongo.MongoClient('localhost', 27017)
jianshu = client['jianshu']
luopan = jianshu['luopan']
xiangyou = jianshu['xiangyou']

urls = ['http://www.jianshu.com/users/54b5900965ea/followers?page={}'.format(str(i)) for i in range(1,101)]
for url in urls:
    html = requests.get(url)
    selector = etree.HTML(html.text)
    infos = selector.xpath('//ul[@class="user-list"]/li')
    if len(infos)> 0:
        for info in infos:
            id = info.xpath('div/a/text()')[0]
            topic = info.xpath('div/div[1]/span[1]/text()')[0].strip('Follow')
            fans = info.xpath('div/div[1]/span[2]/text()')[0].strip('fans')
            article = info.xpath('div/div[1]/span[3]/text()')[0].strip('article')
            content = {
                'id':id,
                'topic':topic,
                'fans':fans,
                'article':article
            }
            # print(id,topic,fans,article)
            xiangyou.insert_one(content)
    else:
        break

Gap between friend and foe

This part is visualized by python data analysis and pyecharts library.

  • First look at the quality of fans:

The fan quality of the running to the right senior is obviously much higher than that of me. They will also interact with some big coffees. When will big coffees also interact with me~ Since only the first 900 fans can be crawled, the difference will be expanded many times.

  • Look at the difference in active fans

This gap is not very big. This is also a problem that Jianshu has always been. A large number of users do not write articles. Jianshu allows us to write our lives simply, and everyone can write more articles, whether it is study or life. , Work, there is always your excitement~

Reference: https://cloud.tencent.com/developer/article/1155627 Python data analysis brief book, fans, friend-we gap, crawler analysis and code friend-we gap-Cloud + Community-Tencent Cloud