Python is interesting | Weibo Internet celebrity competition

Python is interesting | Weibo Internet celebrity competition

Preface

In the previous sharing, we have learned Jianshu and the crawler of Miss Zhihu. Today, Luo Luopan extended his claws to Weibo internet celebrities, we are looking for who is the most beautiful internet celebrity. Today's process is as follows:

Web analytics

Here is the link of Weibo Internet celebrity: https://weibo.com/a/hot/7549094253303809_1.html , this is a new thing in Weibo attention (you don’t need to know too much, just this url). A collection of popular Internet celebrity microblogs in recent times.

This web page is simple, we can directly use the lxml library to parse it. I want to emphasize here that this picture is in general, and it can be a high-definition picture when you enter the detailed page, but I found that you only need to change the "thumb180" in the url of the picture to "mw690" to change the picture to high-definition. E.g:

https://ww4.sinaimg.cn/thumb180/6960aeaaly1g23wtlad3sj21sc2dsu0x.jpg

https://ww4.sinaimg.cn/mw690/6960aeaaly1g23wtlad3sj21sc2dsu0x.jpg
Crawler code

According to the above ideas, we write crawler code:

import requests
from lxml import etree
import re

headers = {
    'cookie':''
}

url ='https://weibo.com/a/hot/7549094253303809_1.html'

res = requests.get(url,headers=headers)
html = etree.HTML(res.text)
infos = html.xpath('//div[@class="UG_list_a"]')

for info in infos:
    name = info.xpath('div[2]/a[2]/span/text()')[0]
    content = info.xpath('h3/text()')[0].strip()
    imgs = info.xpath('div[@class="list_nod clearfix"]/div/img/@src')
    print(name,content)
    i = 1
    for img in imgs:
        href ='https:' + img.replace('thumb180','mw690')
        print(href)
        res_1 = requests.get(href,headers=headers)
        fp = open('row_img/' + name +'+' + content +'+' + str(i) +'.jpg','wb')
        fp.write(res_1.content)
        i = i + 1

Remember to change to your own cookie and you can use it directly~

Face recognition API

We have explained the use of the face recognition API before, and I will explain it here.

1. open the website ( http://ai.baidu.com/tech/face ) and use it immediately after logging in. We first create a face recognition application. The use of api is simple and simple (just look at the documentation), but it is also difficult (everyone's reading ability is slowly declining). 1. we look at the document ( https://ai.baidu.com/docs#/Face-Detect-V3/top ), step by step.

Then we get the token through API Key and Secret Key:

import requests

ak =''
sk =''

host ='https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id={}&client_secret={}'.format(ak,sk)

res = requests.post(host)
print(res.text)

We can get the content of the picture by requesting the corresponding webpage with the token. Let's take a picture of Chaoyue's sister as an example~

import base64
import json

token =''

def get_img_base(file):
    with open(file,'rb') as fp:
        content = base64.b64encode(fp.read())
        return content
    
request_url = "https://aip.baidubce.com/rest/2.0/face/v3/detect"
request_url = request_url + "?access_token=" + token

params = {
    'image':get_img_base('test.jpg'),
    'image_type':'BASE64',
    'face_field':'age,beauty,gender'
}

res = requests.post(request_url,data=params)
result = res.text
json_result = json.loads(result)
code = json_result['error_code']
gender = json_result['result']['face_list'][0]['gender']['type']
beauty = json_result['result']['face_list'][0]['beauty']
print(code,gender,beauty)

### result 0 female 76.25

The token here is obtained from the previous request. In the parameter of params, the picture needs to be encoded in base64 ~ 76.25 beyond the sister, which is pretty awesome.

Comprehensive use

Finally, we request the pictures we saved one by one, get the score of the young lady's picture (1-10 points are processed here), and store them in different folders.

import requests
import os
import base64
import json
import time

def get_img_base(file):
    with open(file,'rb') as fp:
        content = base64.b64encode(fp.read())
        return content

file_path ='row_img'
list_paths = os.listdir(file_path)
for list_path in list_paths:
    img_path = file_path +'/' + list_path
# print(img_path)

    token = '24.890f5b6340903be0642f9643559aa7a1.2592000.1557979582.282335-15797955'

    request_url = "https://aip.baidubce.com/rest/2.0/face/v3/detect"
    request_url = request_url + "?access_token=" + token

    params = {
        'image':get_img_base(img_path),
        'image_type':'BASE64',
        'face_field':'age,beauty,gender'
    }

    res = requests.post(request_url,data=params)
    json_result = json.loads(res.text)
    code = json_result['error_code']
    if code == 222202:
        continue
        
    try:
        gender = json_result['result']['face_list'][0]['gender']['type']
        if gender =='male':
            continue
        beauty = json_result['result']['face_list'][0]['beauty']
        new_beauty = round(beauty/10,1)
        print(img_path,new_beauty)
        if new_beauty >= 8:
            os.rename(os.path.join(file_path,list_path),os.path.join('8 points', str(new_beauty) +'+' + list_path))
        elif new_beauty >= 7:
            os.rename(os.path.join(file_path,list_path),os.path.join('7 points', str(new_beauty) +'+' + list_path))
        elif new_beauty >= 6:
            os.rename(os.path.join(file_path,list_path),os.path.join('6 points',str(new_beauty) +'+' + list_path))
        elif new_beauty >= 5:
            os.rename(os.path.join(file_path,list_path),os.path.join('5 points', str(new_beauty) +'+' + list_path))
        else:
            os.rename(os.path.join(file_path,list_path),os.path.join('Other points',str(new_beauty) +'+' + list_path))
        time.sleep(1)
            
    except KeyError:
        pass
    except TypeError:
        pass
Reference: https://cloud.tencent.com/developer/article/1426512 Python is interesting | Weibo Internet celebrity competition-Cloud + Community-Tencent Cloud