Miss Douyin belongs to your four poses

Miss Douyin belongs to your four poses

Preface

I saw an interesting project last week using Python+ADB to build Douyin-Bot, a Python Douyin robot, with automatic page turning + face recognition, automatic likes and attention. The effect is as shown in the figure below. It can be said to be very beautiful.

Source address: https://github.com/wangshub/Douyin-Bot

What we have achieved today is to download Tik Tok videos in batches. Combined with the above robots, it can be said that it is the show of the robots.

The Douyin crawler we implemented today mainly has the following four functions:

  • Download all videos posted by users according to the Douyin number
  • Automatically download videos liked by users according to the link
  • Automatically download all videos under a topic according to the link
  • According to the link to automatically download all the videos under a certain music, the code will be written using the second function as an example.

Too much nonsense, the results of the previous wave of crawling:

Download video screenshot

Output log screenshot

Actual combat

Introduce class library

import requests
import json
import datetime
import re
import sys
import os
from urllib.parse import urlencode
from contextlib import closing
from requests.packages import urllib3
import random

The main functional modules of this code are as follows:

This time the project is mainly automatically downloaded according to the link shared by the user. 1. we get the following link through sharing:

# This is the sharing link of the user's homepage
https://www.douyin.com/share/user/61806758871/?share_type=link&from=singlemessage
# This is the sharing link of the music interface
https://www.iesdouyin.com/share/music/6562721743650491139?timestamp=1528546868&utm_source=weixin&utm_campaign=client_share&utm_medium=android&app=aweme&iid=33943329942
# This is the sharing link of the theme interface
https://www.iesdouyin.com/share/challenge/1602334725005380?timestamp=1528546923&utm_source=weixin&utm_campaign=client_share&utm_medium=android&app=aweme&iid=33943329942

According to the link above, we can get the following code and get the unique ID

# Parse the link read from the file
def parse_url(urls):
    musics_id = []
    challenges_id = []
    users_id = []
    for i in range(len(urls)):
        url = urls[i]
        if url:
            # Analysis link is a music link
            if re.search('share/music',url):
                music_id = re.findall('share/music/(.*)\?', url)
                # if len(musics_id):
                musics_id.append(music_id[0])
                for music in musics_id:
                    print(music)
                    if music not in os.listdir():
                        os.mkdir(music)
                    download_music_media(music)


            # Analysis link is a topic link
            if re.search('share/challenge', url):
                challenge_id = re.findall('share/challenge/(.*)\?',url)
                challenges_id.append(challenge_id[0])
                for challenge in challenges_id:

                    if challenge not in os.listdir():
                        os.mkdir(challenge)
                    # print(challenge)
                    download_challenge_media(challenge)

            # The analysis link is the user's homepage, and the download requested is the user's favorite video
            if re.search('share/user', url):
                user_id = re.findall('share/user/(.*)/\?',url)
                users_id.append(user_id[0])
                for u_id in users_id:
                    if u_id not in os.listdir():
                        os.mkdir(u_id)
                    # print(challenge)
                    download_ulike_media(u_id)

We can get the following Headers information by opening the above link through the browser:

headers = {
            'user-agent':random.choice(hds),
            'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
            'accept-encoding':'gzip, deflate, br',
            'accept-language':'zh-CN,zh;q=0.9',
            'cache-control':'max-age=0'
}

Let's take the second video that we like to get as an example, open the link we shared, adjust the developer mode to the mobile version, and click "Like" to see the requested link, as shown in the figure below:

View request link

Obtaining the request parameters, you can see that there is a very strange parameter _signature in the request parameters, and this parameter is different each time you request it. By consulting the related project information on gayhub, I found that this parameter is obtained by encrypting the ID number in the link, so connect Next, you can encrypt the ID by calling the encryption JS, and then you can construct a complete request. code show as below:

# Build request parameters
def download_ulike_media(u_id):
    p = os.popen('node fuck-byted-acrawler.js %s'% u_id)
    signature = p.readlines()[0]
    params = {
        'user_id': str(u_id),
        'count': '21',
        'max_cursor': '0',
        'aid': '1128',
        '_signature': signature
    }

You can see that node.js is called above to execute encrypted js, so we need to install NODE.JS, and the installation file is obtained by replying to "node" in the background of the official account. By constructing the request, we successfully get the result of the request, as shown in the figure below. At this time, we need to parse the requested data to get the link to the video.

By checking the returned data, we can find that the link form of the correct video is as follows:

https://www.amemv.com/share/video/xxxxxxxxxxx

Here we can get the id of the video to build a complete video link. code show as below:

    # Mosaic video information
    def get_ulike_url(max_cursor=None, video_count=0):
        video_names = []
        video_urls = []
        url ='https://www.amemv.com/share/video/'
        if max_cursor:
            params['max_cursor'] = str(max_cursor)
        ulike_url ='https://www.douyin.com/aweme/v1/aweme/favorite/?' + urlencode(params)
        # print(ulike_url)
        res = requests.get(ulike_url, headers=headers, verify=False)
        ulike_ms = json.loads(res.content.decode('utf-8'))
        favorite_list = str(ulike_ms['aweme_list'])
        v_id = re.findall('https://www.amemv.com/share/video/(.*?)\'',favorite_list)
        for l in v_id:
            share_desc = l +'.mp4'
            s_url = url + l
            video_names.append(share_desc)
            video_urls.append(s_url)
        parse_media_url(video_names, video_urls, u_id)
        if ulike_ms.get('has_more') == 1:
            return get_ulike_url(ulike_ms.get('max_cursor'), video_count)
    video_count = get_ulike_url()
    if video_count == 0:
        print('This user has no favorite videos')

We click on the video link we built above to see what the specific situation of the page looks like.

To view the information returned by the webpage, we open the link in the red box above, and you can see that it is the resource address of the video.

Open the source video address according to the above ideas, we can build the following code:

# Download module
def _download_video(video_url, path):
    video_content = get_video_url(video_url)
    # print(video_content)
    rec = re.compile(r'class="video-player" src="(.*?)"')
    pattern = re.compile(r'playwm')
    downloadwm_url = rec.search(video_content).group(1)
    # Build no watermark download link
    download_url = re.sub(pattern,'play', downloadwm_url)
    print('Downloading:',download_url, path)
    with closing(requests.get(download_url, headers=headers, stream=True, verify=False)) as response:
        chunk_size = 1024
        if response.status_code == 200:
            with open(path,'wb') as f:
                for data in response.iter_content(chunk_size=chunk_size):
                    f.write(data)
                The # flush() method is used to flush the buffer, which means that the data in the buffer is written to the file immediately, and the buffer is cleared at the same time. There is no need to passively wait for the output buffer to be written.
                    f.flush()

The above is the code to download the user's like video. Compared with other functions, it is a little more complicated. Other functions can obtain the request interface through the mobile phone capture, and there is no encryption parameter. The code of this project is basically similar. Here is the code for downloading music videos as an example to talk about the capture part: The capture tool used this time is Charles, and the basic configuration can be seen in the following article: 10 lines of code to achieve automatic participation Lucky Draw Assistant Lucky Draw (Part 1) After configuring Charles, open Douyin and refresh the phone page, you can see that there are two highlighted links in the request link on the left column. The screenshot of Charles is as follows:

Charles screenshot and click to open the response data to see each link. We only need to parse the videoid contained in the share_url in each link, and then bring it into the API to get the real video address.

Resolve the videoid contained in the link in the red box in the figure. When testing the API, it is strongly recommended to use Postman to test the usability of the link to reduce the number of parameters we request and the complexity of the test.

Part of the screenshot when testing the interface

Pay attention

  • After I wrote it to myself, I didn’t use multi-threading, and it was a bit slow to download, so I could only apply it after I fully understood it.

Reference: https://cloud.tencent.com/developer/article/1518515 Miss Douyin belongs to your four poses-Cloud + Community-Tencent Cloud