Douyin Miss Sister Video Crawler

Douyin Miss Sister Video Crawler

Preface

Some time ago, the creation of 101 became a real hit, and asked me which lady to pick, of course, she was out of tune in singing, and Yang surpassed her in dancing. In fact, before I watched Create 101, I already followed her on Douyin, and today I will crawl her Douyin video (Yang Chaoyue's Douyin has not been updated), I have your look~

This article mainly explains:

  • Douyin Video Crawler
  • Video download

PS: Although many netizens criticized her for lacking strength, others are really lucky~

Douyin Video Crawler

Here is still the packet capture through fiddler. Now Douyin has an encryption algorithm. Most of the codes on the Internet can no longer be used. Let's take a look at what encryption fields are available.

https://aweme.snssdk.com/aweme/v1/aweme/post/?iid=40337863888&device_id=35102252294&os_api=18&app_name=aweme&channel=App%20Store&idfa=11926ED5-C282-4BBC-AF01-0E8C18120647&device_platform=Avid623A101&device_platform=Avid number=5Aphone -4A03-9352-57C0681CDDDC & openudid = 1ee725d39e05794bcdc14537f8c1f4220c7d6fd5 & device_type = iPhone8,1 & app_version = 2.3.1 & version_code = 2.3.1 & os_version = 11.4.1 & screen_width = 750 & aid = 1128 & ac = WIFI & count = 21 & max_cursor = 0 & min_cursor = 0 & user_id = 58554069260 & mas = 01bf537030d65155897d6fd1d7c97862dbca9722fea8c96d2b68de & as = a1858817de104b87435065 & ts = 1534297870

The main reason here is that the mas and as parameters do not know how to construct it. Because it is only crawling the data of a user, the entire URL is copied to txt, and then the data is requested.

import requests
import json

f = open('2.txt','w+',encoding='utf-8')

headers = {
    'Host':'api.amemv.com',
    'Accept':'*/*',
    'Cookie': 'install_id = 40337863888; login_flag = d6f29ec905af4bf1101199aa942c466f; odin_tt = a1e12dc3e4b92de77cccf6be1717377188f8aa7582f703c1391c8dc7d4a0df1b166119681af4277bd2cdc8aeb56000a7; sessionid = 718df70f4e4964723cd1c8337c367b45; sid_guard = 718df70f4e4964723cd1c8337c367b45% 7C1534207148% 7C5184000% 7CSat% 2C + 13-Oct-2018 + 00% 3A39% 3A08 + GMT; sid_tt = 718df70f4e4964723cd1c8337c367b45 ; ttreq=1$ad10f98ec66ad6df5b86a7b1a613c77bb674236d; uid_tt=765536856bdc4f0f299b85dbc7338982',
    'User-Agent':'Aweme/2.3.1 (iPhone; iOS 11.4.1; Scale/2.00)',
    'Accept-Language':'zh-Hans-CN;q=1',
    'Accept-Encoding':'br, gzip, deflate',
    'Connection':'keep-alive'
}


def get_info(url):
    res = requests.get(url,headers=headers)
    json_data = json.loads(res.text)
    datas = json_data['aweme_list']
    for data in datas:
        desc = data['desc']
        download_url = data['video']['play_addr']['url_list'][0]
        print(desc,download_url)
        f.write(desc+','+download_url+'\n')

if __name__ =='__main__':
    fp = open('1.txt','r')
    for line in fp.readlines():
        get_info(line.strip())

Video download

On the Internet, it is said that this link is only valid for more than 10 minutes (without verification), so the crawler saves the video link first, and then downloads it at the same time.

import requests

def download_url(desc,url):
    global i
    res = requests.get(url)
    if len(desc) == 0:
        desc = str(i)
    f = open('video/'+desc+'.mp4','wb')
    f.write(res.content)
    i = i + 1

i = 1
fp = open('2.txt','r', encoding='utf-8')
for line in fp.readlines():
    desc = line.split(',')[0]
    url = line.split(',')[1].strip('\n')
    print(url)
    download_url(desc,url)

Finally, I wish you all can achieve what you want~

Reference: https://cloud.tencent.com/developer/article/1197121 Douyin Miss Sister Video Crawler-Cloud + Community-Tencent Cloud