Preface
Python is very popular now, with simple syntax and powerful functions. Many students want to learn Python! So the little ones have prepared high-value Python learning video tutorials and related electronic books for everyone. Welcome to receive them!
If we compare the Internet to a big spider web, the data is stored in each node of the spider web, and the crawler is just a small spider.
Crawling its own prey (data) along the web refers to a program that initiates a request to a website, obtains resources, and analyzes and extracts useful data;
From a technical point of view, it is to simulate the behavior of the browser requesting the site through the program, crawl the HTML code/JSON data/binary data (pictures, videos) returned by the site to the local, and then extract the data you need and store it for use;
Version: Python3
System: Windows
IDE: Pycharm
Request library: requests, selenium (can drive the browser to parse and render CSS and JS, but there are performance disadvantages (useful and useless web pages will be loaded);)
Parsing library: regular, beautifulsoup, pyquery
Repository: files, MySQL, Mongodb, Redis
Basic version:
(If you need to crawl 30 videos in total, open 30 threads to do it, and the time it takes is the slowest time)
Understand the basic process of Python crawler, and then compare the code, do you think the crawler is particularly simple?
ps: Here I recommend my python zero-based system learning exchange deduction qun: 322795889, if you don’t understand python (learning method, learning route, how to learn efficiently), you can add it, there is a good learning tutorial in the group , Development tools, e-book sharing. Professional teacher answering questions
Alright! The article is shared with the readers here
Finally, if you find it helpful, remember to follow, forward, and favorite
·END·