3.Python crawler versions, crawl the school flower net, easy entry crawler


What is a crawler?

If we compare the Internet to a big spider web, the data is stored in each node of the spider web, and the crawler is just a small spider.

Crawling its own prey (data) along the web refers to a program that initiates a request to a website, obtains resources, and analyzes and extracts useful data;

From a technical point of view, it is to simulate the behavior of the browser requesting the site through the program, crawl the HTML code/JSON data/binary data (pictures, videos) returned by the site to the local, and then extract the data you need and store it for use;

Basic environment configuration

Version: Python3

System: Windows

IDE: Pycharm

Tools needed for crawlers:

Request library: requests, selenium (can drive the browser to parse and render CSS and JS, but there are performance disadvantages (useful and useless web pages will be loaded);)

Parsing library: regular, beautifulsoup, pyquery

Repository: files, MySQL, Mongodb, Redis

The basic process of Python crawler

Basic version:

Function package version

Concurrent version

(If you need to crawl 30 videos in total, open 30 threads to do it, and the time it takes is the slowest time)

Understand the basic process of Python crawler, and then compare the code, do you think the crawler is particularly simple?

