Python crawler + face value score, 5000+ pictures to find your Mrs. Right

Python crawler + face value score, 5000+ pictures to find your Mrs. Right

Love at first sight is not love, but the face

Project Description

This project uses Python crawler and Baidu face recognition API to crawl user photos (invasion and deletion) and score them for the short book dating column. This project includes the following:

  • Picture crawler
  • Face recognition API usage
  • Score the appearance and classify files

Picture crawler

Now all major dating sites will have some users who will explode photos. This article crawls all the posts in the Jianshu dating column ( https://www.jianshu.com/c/bd38bd199ec6 ) and enters the detailed page to get all the pictures and download them to local.

Code
import requests
from lxml import etree
import time

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
}

def get_url(url):
    res = requests.get(url,headers=headers)
    html = etree.HTML(res.text)
    infos = html.xpath('//ul[@class="note-list"]/li')
    for info in infos:
        root ='https://www.jianshu.com'
        url_path = root + info.xpath('div/a/@href')[0]
        # print(url_path)
        get_img(url_path)
    time.sleep(3)

def get_img(url):
    res = requests.get(url, headers=headers)
    html = etree.HTML(res.text)
    title = html.xpath('//div[@class="article"]/h1/text()')[0].strip('|').split(',')[0]
    name = html.xpath('//div[@class="author"]/div/span/a/text()')[0].strip('|')
    infos = html.xpath('//div[@class = "image-package"]')
    i = 1
    for info in infos:
        try:
            img_url = info.xpath('div[1]/div[2]/img/@data-original-src')[0]
            print(img_url)
            data = requests.get('http:' + img_url,headers=headers)
            try:
                fp = open('row_img/' + title +'+' + name +'+' + str(i) +'.jpg','wb')
                fp.write(data.content)
                fp.close()
            except OSError:
                fp = open('row_img/' + name +'+' + str(i) +'.jpg','wb')
                fp.write(data.content)
                fp.close()
        except IndexError:
            pass
        i = i + 1

if __name__ =='__main__':
    urls = ['https://www.jianshu.com/c/bd38bd199ec6?order_by=added_at&page={}'.format(str(i)) for i in range(1,201)]
    for url in urls:
        get_url(url)

Face recognition API usage

Since all the pictures below the post are crawled, there are various pictures (not including the face), and it is to find the high-value lady, if manual screening is laborious, here is the call to Baidu's face recognition API to filter the pictures Scoring with face value.

Face recognition application application
  • 1. go to the official website of Baidu face recognition ( http://ai.baidu.com/tech/face ), click Use Now, and log in to your Baidu account (register one if you don't have one).
  • After creating an application, click Manage Application, you can see AppID, etc., which need to be used when calling API.
API call

Here, I use Yang Chaoyue's picture to try it out first. Through the results, you can see that the score is 75, which is quite high (I tested it with some Internet celebrities and celebrities, and the average score is around 80, and the highest is not more than 90).

from aip import AipFace
import base64
 
APP_ID =''
API_KEY =''
SECRET_KEY =''
 
aipFace = AipFace(APP_ID, API_KEY, SECRET_KEY)
 
filePath = r'C:\Users\LP\Desktop\6.jpg'
def get_file_content(filePath):
    with open(filePath,'rb') as fp:
        content = base64.b64encode(fp.read())
        return content.decode('utf-8')
    
imageType = "BASE64"
    
options = {}
options["face_field"] = "age,gender,beauty"

result = aipFace.detect(get_file_content(filePath),imageType,options)
print(result)

Score the appearance and classify files

Finally, combine the picture data and face value to score, design the code, filter out non-character and male pictures, and obtain the score of the lady's picture (here processed as 1-10 points), and store them in different folders.

from aip import AipFace
import base64
import os
import time

APP_ID =''
API_KEY =''
SECRET_KEY =''
 
aipFace = AipFace(APP_ID, API_KEY, SECRET_KEY)

def get_file_content(filePath):
    with open(filePath,'rb') as fp:
        content = base64.b64encode(fp.read())
        return content.decode('utf-8')
    
imageType = "BASE64"
    
options = {}
options["face_field"] = "age,gender,beauty"

file_path ='row_img'
file_lists = os.listdir(file_path)
for file_list in file_lists:
    result = aipFace.detect(get_file_content(os.path.join(file_path,file_list)),imageType,options)
    error_code = result['error_code']
    if error_code == 222202:
        continue
        
    try:
        sex_type = result['result']['face_list'][-1]['gender']['type']
        if sex_type =='male':
            continue
    # print(result)
        beauty = result['result']['face_list'][-1]['beauty']
        new_beauty = round(beauty/10,1)
        print(file_list,new_beauty)
        if new_beauty >= 8:
            os.rename(os.path.join(file_path,file_list),os.path.join('8 points',str(new_beauty) +'+' + file_list))
        elif new_beauty >= 7:
            os.rename(os.path.join(file_path,file_list),os.path.join('7 points',str(new_beauty) +'+' + file_list))
        elif new_beauty >= 6:
            os.rename(os.path.join(file_path,file_list),os.path.join('6 points',str(new_beauty) +'+' + file_list))
        elif new_beauty >= 5:
            os.rename(os.path.join(file_path,file_list),os.path.join('5分',str(new_beauty) +'+' + file_list))
        else:
            os.rename(os.path.join(file_path,file_list),os.path.join('Other points',str(new_beauty) +'+' + file_list))
        time.sleep(1)
    except KeyError:
        pass
    except TypeError:
        pass

In the end, there are very few young ladies with a score of 8 or more, as shown in the picture (invasion).

discuss

  • The number of Jianshu dating ladies is relatively small, so readers can try Weibo internet celebrities or Zhihu beauties.
  • Although this is an age of looking at faces, liking a person starts from the value of appearance, being trapped in talent, and being loyal to character (the last positive energy wave will prevent you from being blocked).
Reference: https://cloud.tencent.com/developer/article/1182012 Python crawler + color score, 5000+ pictures to find your Mrs. Right-Cloud + Community-Tencent Cloud