Preface
Python is very popular now, with simple syntax and powerful functions. Many students want to learn Python! So the little ones have prepared high-value Python learning video tutorials and related electronic books for everyone. Welcome to receive them!
Regarding the problem of Zhihu verification code login, I used PIL, an important image processing library on Python. If it doesn't work, save the image locally and enter it manually.
By capturing the packet of Zhihu login, you can find that login Zhihu requires three parameters for post, one is the account number, one is the password, and the other is xrsf. This xrsf is hidden in the form. Every time you log in, the server should randomly generate a string. All, when you want to simulate login, you must get xrsf.
The result of using chrome (or Firefox httpfox to capture packets):
Therefore, the value of xsrf must be obtained. Note that this is a dynamically changing parameter, which is different every time.
Note the difference between findall and find_all functions.
After getting xsrf, you can simulate login below. Using the session object of the requests library, the advantage of establishing a session is that different requests of the same user can be linked, and cookies will be automatically processed until the end of the session.
Note: cookies is a file in the current directory. This file saves the cookies you know. If you are the first to log in, then of course there is no such file and you cannot log in through the cookie file. A password must be entered.
def login(secret, account): # Determine whether it is a mobile phone number by the entered user name if re.match(r"^1\d{10}$", account): print("Mobile phone number login\n") post_url ='https://www.zhihu.com/login/phone_num' postdata = { '_xsrf': get_xsrf(), 'password': secret, 'remember_me':'true', 'phone_num': account, } else: if "@" in account: print("Mailbox login\n") else: print("There is a problem with your account input, please log in again") return 0 post_url ='https://www.zhihu.com/login/email' postdata = { '_xsrf': get_xsrf(), 'password': secret, 'remember_me':'true', 'email': account, } try: # No verification code is required to log in successfully login_page = session.post(post_url, data=postdata, headers=headers) login_code = login_page.text print(login_page.status_code) print(login_code) except: # You need to enter the verification code to log in successfully postdata["captcha"] = get_captcha() login_page = session.post(post_url, data=postdata, headers=headers) login_code = eval(login_page.text) print(login_code['msg']) session.cookies.save() try: input = raw_input except: pass
This is the login function. Use the login function to log in, post your account, password and xrsf to the login authentication page of Zhihu, then get the cookie, and save the cookie to the file in the current directory. When you log in next time, read this cookie file directly.
#LWP-Cookies-2.0 Set-Cookie3: cap_id="\"YWJkNTkxYzhiMGYwNDU2OGI4NDUxN2FlNzBmY2NlMTY=|1487052577|4aacd7a27b11a852e637262bb251d79c6cf4c8dc\""";2017-03 path="/"; domain=".zhi"spec.com expired:09"; ; version=0 Set-Cookie3: l_cap_id="\"OGFmYTk3ZDA3YmJmNDQ4YThiNjFlZjU3NzQ5NjZjMTA=|1487052577|0f66a8f8d485bc85e500a121587780c7c8766faf\""; path="/"; 09: domain=". ; version=0 Set-Cookie3: login="\"NmYxMmU0NWJmN2JlNDY2NGFhYzZiYWIxMzE5ZTZiMzU=|1487052597|a57652ef6e0bbbc9c4df0a8a0a59b559d4e20456\""; path="/" 06; expires=".Zzhi" 06; ; version=0 Set-Cookie3: q_c1="ee29042649aa4f87969ed193acb6cb83|1487052577000|1487052577000"; path="/"; domain=".zhihu.com"; path_spec; expires="2020-02-14 06:09:37Z"; version=0 Set-Cookie3: z_c0="\"QUFCQTFCOGdBQUFYQUFBQVlRSlZUVFVzeWxoZzlNbTYtNkt0Qk1NV0JLUHZBV0N6NlNNQmZ3PT0=|1487052597|expired ; httponly=None; version=0
This is the content of the cookie file
The following is the source code:
#!/usr/bin/env python # -*- coding: utf-8 -*- import requests try: import cookielib except: import http.cookiejar as cookielib import re import time import os.path try: from PIL import Image except: pass from bs4 import BeautifulSoup # Construct Request headers agent ='Mozilla/5.0 (Windows NT 5.1; rv:33.0) Gecko/20100101 Firefox/33.0' headers = { "Host": "www.zhihu.com", "Referer": "https://www.zhihu.com/", 'User-Agent': agent } # Use login cookie information session = requests.session() session.cookies = cookielib.LWPCookieJar(filename='cookies') try: session.cookies.load(ignore_discard=True) except: print("Cookie failed to load") def get_xsrf(): '''_xsrf is a dynamically changing parameter''' index_url ='https://www.zhihu.com' # Get the _xsrf that needs to be used when logging in index_page = session.get(index_url, headers=headers) html = index_page.text pattern = r'name="_xsrf" value="(.*?)"' # Here _xsrf returns a list _xsrf = re.findall(pattern, html) return _xsrf[0] # get verification code def get_captcha(): t = str(int(time.time() * 1000)) captcha_url ='https://www.zhihu.com/captcha.gif?r=' + t + "&type=login" r = session.get(captcha_url, headers=headers) with open('captcha.jpg','wb') as f: f.write(r.content) f.close() # Use pillow's Image to display the verification code # If Pillow is not installed, go to the directory where the source code is located to find the verification code and enter it manually try: im = Image.open('captcha.jpg') im.show() im.close() except: print(u'Please find captcha.jpg in the %s directory and manually enter'% os.path.abspath('captcha.jpg')) captcha = input("please input the captcha\n>") return captcha def isLogin(): # Judge whether you have logged in by checking the user's personal information url = "https://www.zhihu.com/settings/profile" login_code = session.get(url, headers=headers, allow_redirects=False).status_code if login_code == 200: return True else: return False def login(secret, account): # Determine whether it is a mobile phone number by the entered user name if re.match(r"^1\d{10}$", account): print("Mobile phone number login\n") post_url ='https://www.zhihu.com/login/phone_num' postdata = { '_xsrf': get_xsrf(), 'password': secret, 'remember_me':'true', 'phone_num': account, } else: if "@" in account: print("Mailbox login\n") else: print("There is a problem with your account input, please log in again") return 0 post_url ='https://www.zhihu.com/login/email' postdata = { '_xsrf': get_xsrf(), 'password': secret, 'remember_me':'true', 'email': account, } try: # No verification code is required to log in successfully login_page = session.post(post_url, data=postdata, headers=headers) login_code = login_page.text print(login_page.status_code) print(login_code) except: # You need to enter the verification code to log in successfully postdata["captcha"] = get_captcha() login_page = session.post(post_url, data=postdata, headers=headers) login_code = eval(login_page.text) print(login_code['msg']) session.cookies.save() try: input = raw_input except: pass ## Output the main question list on the shell def getPageQuestion(url2): mainpage = session.get(url2, headers=headers) soup=BeautifulSoup(mainpage.text,'html.parser') tags=soup.find_all("a",class_="question_link") #print tags for tag in tags: print tag.string # Output the summary of the answer to the question on the main page on the shell def getPageAnswerAbstract(url2): mainpage=session.get(url2,headers=headers) soup=BeautifulSoup(mainpage.text,'html.parser') tags=soup.find_all('div',class_='zh-summary summary clearfix') for tag in tags: # print tag print tag.get_text() print'Detailed link:',tag.find('a').get('href') def getPageALL(url2): #mainpage=session.get(url2,headers=headers) #soup=BeautifulSoup(mainpage.text,'html.parser') #tags=soup.find_all('div',class_='feed-item-inner') #print "def getpageall " mainpage=session.get(url2,headers=headers) soup=BeautifulSoup(mainpage.text,'html.parser') tags=soup.find_all('div',class_='feed-content') for tag in tags: #print tag print tag.find('a',class_='question_link').get_text() # There is a problem here. Bs is still not too skilled #print tag.find('a',class_='zh-summary summary clearfix').get_text() #print tag.find('div',class_='zh-summary summary clearfix').get_text() if __name__ =='__main__': if isLogin(): print('You are already logged in') url2='https://www.zhihu.com' # getPageQuestion(url2) #getPageAnswerAbstract(url2) getPageALL(url2) else: account = input('Please enter your username\n>') secret = input("Please enter your password\n> ") login(secret, account)
operation result:
ps: Friends who want to learn python recommend here the python zero-based system I built to learn communication buckle qun: 322795889, there are free video tutorials, development tools, and e-book sharing in the group. Professional teachers answer questions! Learn python web, python crawler, data analysis, artificial intelligence and other technologies if you don’t understand, you can join in to exchange and learn together, and make progress together!
Alright! The article is shared with the readers here
Finally, if you find it helpful, remember to follow, forward, and favorite
·END·