Analyze 6000 apps with Python and draw these conclusions

Analyze 6000 apps with Python and draw these conclusions

Author : Suk, zero-based, switch python reptiles and Data Analysis


Abstract: Nowadays, the mobile Internet is becoming more and more developed, and various apps are emerging in an endless stream, and there are advantages and disadvantages. Compared with ordinary apps, we are definitely willing to use those with good conscience, but to find these apps is not It’s too easy. This article uses the Scrapy framework to crawl more than 6000 apps on the well-known app download market "". Through analysis, we found the best in various categories. These apps can be called real works of conscience. Use They will bring you a brand new mobile phone experience.

1. Analyze the background

1.1. Why choose Kuan

If GitHub is a programmer's paradise, then cool Ann is a mobile phone App fans (aka "out drive" lovers) paradise compared to those of traditional mobile application download market, there are three special cool security:

1. the search can be downloaded to a variety of artifacts, good software , download other applications market is almost hard to find. For example, the terminal desktop "Aris" mentioned in the previous article, the most powerful Android reader "Quiet Reading World", the RSS reader "Feedme" and so on.

2. you can find a lot of App cracked version . We advocate "pay for good things", but some apps are painful, such as "Baidu Netdisk", where you can find many cracked versions of apps.

3. the App can be found in the version history . Many people like to use the latest version of App, there is a update immediately upgrade, but now a lot of App more and more utilitarian, more updated more bloated, advertising everywhere, we might as well go back to basics , using a compact, streamlined, ad-free version earlier .

As an app lover, I found a lot of good apps on Ku'an. The more I use it, the more I feel that all I know is just the tip of the iceberg. I want to grab a look at how many good things there are on this website, and manually find them one by one. It must be unrealistic. Naturally, I thought of the best way to solve it with a crawler. In order to achieve this goal, I recently learned about the Scrapy crawler framework and crawled about 6000 apps on the net. Through analysis, I found the difference. Let’s take a look at the boutique apps in the field.

1.2. Analysis content

  • Overall analysis of the ratings, downloads, volume and other indicators of 6000 apps.
  • According to the daily use function scenarios, the apps are divided into 10 categories: system tools, information reading, social entertainment, etc., and the boutique apps under each category are filtered out.

1.3. Analysis Tools

  • Python
  • Scrapy
  • MongoDB
  • Pyecharts
  • Matplotlib

2. Data capture

Since Kuan’s mobile app has anti-scrabble measures, after trying it with Charles, it was found that the packet could not be captured. The next step is to use Scrapy to grab the App information on the web page. The crawling period ended on November 23, 2018, with a total of 6,086 apps, and a total of 8 fields of information were captured: App name, downloads, ratings, number of ratings, number of comments, number of followers, volume, and App category tags.

2.1. Target website analysis

This is the landing page we want to crawl. Click to turn the page to find two useful information:

  • Each page displays 10 pieces of App information, a total of 610 pages, which is about 6,100 apps.
  • The web page request is in the form of GET, the URL has only one parameter for increasing the number of pages, and the construction of page turning is very simple.

Next, let’s take a look at what information we choose to grab. You can see that the App name, downloads, ratings and other information are displayed on the main page. We click on the App icon to enter the details page, and we can see that more complete information is provided. , Including: classification label, number of ratings, number of followers, etc. Since we need to sort and filter apps in the future, the classification labels are very useful, so here we choose to enter each App homepage to grab the required information indicators.

Through the above analysis, we can determine the crawling process. 1. we traverse the main page and crawl the URLs of the 10 App detail pages, and then crawl the metrics of each App on the detail page. After this traversal, we need to crawl 6000. The crawling workload is not small for the left and right web content, so we will try to use the Scrapy framework for crawling next.

2.2. Introduction to Scrapy Framework

Before introducing Scrapy framework, we first recall Pyspider framework, we use it before crawling the tiger sniffing net 50 000 articles , it is a big domestic God prepared a reptile weapon, Github Star exceed 10K, but its overall functionality is relatively Thinner, is there a more powerful framework? Yes, it is the Scrapy framework to be mentioned here. The Github Star is more than 30K. It is the most widely used crawler framework in the Python crawler world . You must know how to play crawler .

There are many official documents and tutorials about Scrapy on the Internet, here are a few.

Scrapy Chinese documentation Cui Qingcai's Scrapy column Scrapy Climbing hook Scrapy Climbing Douban movie

The Scrapy framework is relatively more complicated than Pyspider. There are different processing modules. The project file is also composed of several programs. Different crawler modules need to be placed in different programs. So at the beginning, you will feel that the programs are scattered. It's easy to confuse people. It is recommended to take the following ideas to quickly get started with Scrapy:

  • 1. quickly go through the above reference tutorial to understand Scrapy's crawler logic and the purpose and cooperation of each program.
  • Next, look at the above two practical cases to get familiar with how to write crawlers in Scrapy.
  • Finally, find a website that you are interested in as a crawler project. If you don’t understand, just read the tutorial or Google.

This kind of learning path is relatively fast and effective, and it is much better than having to follow the tutorial without hands-on. Below, let’s take Ku’an as an example and use Scrapy to crawl it.

2.3. Capture data

First of all, you must install the Scrapy framework. If it is a Windwos system and Anaconda has been installed, then installing the Scrapy framework is very simple. Just open the Anaconda Prompt command window and enter the following command. It will automatically install Scrapy for us. Installed and dependent libraries.

1conda pip scrapy

2.3.1. Create Project

Next, we need to create a crawler project, so we first switch from the root directory to the working path where the project needs to be placed. For example, the storage path I set here is: E:\my_Python\training\kuan, and then continue to enter the following line of code. Create a kuan crawler project:

1# Switch working path
3cd E:\my_Python\training\kuan
4# Generate project
5scrapy startproject kuspider

After executing the above command, a scrapy crawler project named kuan will be generated, which contains the following files:

1scrapy. cfg # Scrapy deployment configuration file
2kuan # The module of the project, need to be imported from here # Define the data structure of the crawl # Middlewares Middleware # Data pipeline file, which can be used for subsequent storage # Configuration file
8spiders # crawl the main program folder

Next, we need to create a crawling main program in the spiders folder:, and then run the following two lines of commands:

1cd kuan # Enter the kuan project folder just generated
2scrapy genspider kuan # Generate crawler main program file

2.3.2. Declare item

After the project file is created, we can start writing the crawler program.

1. you need to pre-define the name of the field information to be crawled in the file, as shown below:

 1class KuanItem(scrapy.Item):
 2# define the fields for your item here like:
 3name = scrapy.Field()
 4volume = scrapy.Field()
 5download = scrapy.Field()
 6follow = scrapy.Field()
 7comment = scrapy.Field()
 8tags = scrapy.Field()
 9score = scrapy.Field()
10num_score = scrapy.Field()

The field information here is the 8 field information we located in the web page, including: name represents the name of the App, volume represents the volume, and download represents the number of downloads. After being defined here, we will use these field information in the subsequent crawling main program.

2.3.3. Crawling the main program

After the kuan project is created, the Scrapy framework will automatically generate part of the crawling code. Next, we need to add the field parsing content of web page crawling to the parse method.

1class KuanspiderSpider(scrapy.Spider):
2 name ='kuan'
3 allowed_domains = ['']
4 start_urls = ['']
6 def parse(self, response):
7 pass

Open Dev Tools on the homepage, find the node position of each crawling indicator, and then use CSS, Xpath, regular and other methods to extract and analyze. Scrapy supports all these methods and can be chosen at will. Here we choose CSS syntax to locate the node, but it needs Note that the CSS syntax of Scrapy is slightly different from the CSS syntax we used before with pyquery. Let me give a few examples and compare them.

First of all, we have to first locate a home page URL APP nodes, the nodes can see the URL class property located in app_left_lista node under the div node, its href attribute is the URL information we need, here it is a relative address, after the stitching is The full URL.

Then we entered the cool security details page, select the App name and position, you can see the name of the node is located for the App class attribute .detail_app_titletext nodes in p.

After locating these two nodes, we can use CSS to extract field information. Here is a comparison between the conventional writing and the writing in Scrapy:

1# Conventional writing
2url = item('.app_left_list>a').attr('href')
3name = item('.list_app_title').text()
4# Scrapy writing
5url = item.css('::attr("href")').extract_first()
6name = item.css('.detail_app_title::text').extract_first()

As you can see, to get the href or text attribute, you need to use ::, for example, to get text, use ::text. extract_first() means to extract the first element. If there are multiple elements, use extract(). Then, we can refer to the parsing code to write the 8 field information.

1. we need to extract the URL list of the App on the homepage, and then enter the details page of each App to further extract 8 fields of information.

1def parse(self, response):
2 contents = response.css('.app_left_list>a')
3 for content in contents:
4 url ​​= content.css('::attr("href")').extract_first()
5 url = response.urljoin(url) # splicing relative url to absolute url
6 yield scrapy.Request(url,callback=self.parse_url)

Here, use the response.urljoin() method to splice the extracted relative URLs into a complete URL, and then use the scrapy.Request() method to construct a request for each App detail page. Here we pass two parameters: url and callback, url For the detail page URL, callback is a callback function, which passes the response returned by the homepage URL request to the parse_url() method specifically used to parse the field content, as shown below:

 1def parse_url(self,response):
 2 item = KuanItem()
 3 item['name'] = response.css('.detail_app_title::text').extract_first()
 4 results = self.get_comment(response)
 5 item['volume'] = results[0]
 6 item['download'] = results[1]
 7 item['follow'] = results[2]
 8 item['comment'] = results[3]
 9 item['tags'] = self.get_tags(response)
10 item['score'] = response.css('.rank_num::text').extract_first()
11 num_score = response.css('.apk_rank_p1::text').extract_first()
12 item['num_score'] ='Total (.*?) scores',num_score).group(1)
13 yield item
15def get_comment(self,response):
16 messages = response.css('.apk_topba_message::text').extract_first()
17 result = re.findall(r'\s+(.*?)\s+/\s+(.*?) download\s+/\s+(.*?) people follow\s+/\s+(.*?) Comment.*?',messages) #/s+ means to match any blank character more than once
18 if result: # not empty
19 results = list(result[0]) # Extract the first element in the list
20 return results
22def get_tags(self,response):
23 data = response.css('.apk_left_span2')
24 tags = [item.css('::text').extract_first() for item in data]
25 return tags

Here, two methods, get_comment() and get_tags() are defined separately.

The get_comment() method extracts four fields of volume, download, follow, and comment through regular matching. The regular matching results are as follows:

 1result = re.findall(r'\s+(.*?)\s+/\s+(.*?) download\s+/\s+(.*?) people follow\s+/\s+(.*?) comments .*?',messages)
 2print(result) # Output the result information of the first page
 3# The results are as follows:
 4[('21.74M', '52.18 million', '2.4 million', '5.4 million')]
 5[('75.53M', '27.68 million', '23 million', '3.0 million')]
 6[('46.21M', '16.86 million', '23 million', '3.4 million')]
 7[('54.77M', '16.03 million', '3.8 million', '4.9 million')]
 8[('3.32M', '15.3 million', '15,000', '3343')]
 9[('75.07M', '11.27 million', '16,000', '22,000')]
10[('92.70M', '11.08 million', '9167', '1.3 million')]
11[('68.94M', '1072 million', '5718', '9869')]
12[('61.45M', '935 million', '1.1 million', '1.6 million')]
13[('23.96M', '925 million', '4157', '1956')]

Then use result[0], result[1], etc. to extract four pieces of information, take volume as an example, and output the extraction results of the first page:

 1item['volume'] = results[0]

In this way, all the field information of the 10 apps on the first page have been successfully extracted, and then returned to the yied item generator, we output its content:

2{'name':'酷安','volume': '21.74M','download': '52.18 million','follow': '2.4 million','comment': '5.4 million','tags' : "['酷市场','酷安','MARKET','coolapk','Installed essential']",'score': '4.4','num_score': '1.4 million'}, 
3{'name':'WeChat','volume': '75.53M','download': '27.68 million','follow': '23 million','comment': '3.0 million','tags': "['WeChat','qq','Tencent','tencent','Live chat','Required for installation']",'score': '2.3','num_score': '1.1 million'},

2.3.4. Paged crawling

Above, we crawled the content of the first page, and then we need to traverse and crawl the content of all 610 pages. There are two ideas here:

  • The first is to extract the page-turning node information, and then construct a request for the next page, and then repeatedly call the parse method to parse, and so on, until the last page is parsed.
  • The second method is to directly construct the URL address of page 610, and then call the parse method in batches for parsing.

Here, we write the parsing code of the two methods respectively. The first method is very simple, directly follow the parse method and continue to add the following lines of code:

1def parse(self, response):
2 contents = response.css('.app_left_list>a')
3 for content in contents:
4 ...
6 next_page = response.css('.pagination li:nth-child(8) a::attr(href)').extract_first()
7 url = response.urljoin(next_page)
8 yield scrapy.Request(url,callback=self.parse)

In the second method, we define a start_requests() method before the first parse() method, which is used to generate 610 page URLs in batches, and then pass the callback parameter in the scrapy.Request() method to the following parse () method for analysis.

1def start_requests(self):
2 pages = []
3 for page in range(1,610): # A total of 610 pages
4 url ​​=''%page
5 page = scrapy.Request(url,callback=self.parse)
6 pages.append(page)
7 return pages

The above is the crawling ideas for all pages. After the crawling is successful, we need to store it. Here, I choose to store it in MongoDB. I have to say that MongoDB is much more convenient and less troublesome than MySQL.

2.3.5. Store results

In the program, we define the data storage method. Some parameters of MongoDB, such as the address and database name, need to be stored separately in the settings file, and then called in the pipelines program.

 1import pymongo
 2class MongoPipeline(object):
 3 def __init__(self,mongo_url,mongo_db):
 4 self.mongo_url = mongo_url
 5 self.mongo_db = mongo_db
 6 @classmethod
 7 def from_crawler(cls, crawler):
 8 return cls(
 9 mongo_url = crawler.settings.get('MONGO_URL'),
10 mongo_db = crawler.settings.get('MONGO_DB')
12 def open_spider(self,spider):
13 self.client = pymongo.MongoClient(self.mongo_url)
14 self.db = self.client[self.mongo_db]
15 def process_item(self,item,spider):
16 name = item.__class__.__name__
17 self.db[name].insert(dict(item))
18 return item
19 def close_spider(self,spider):
20 self.client.close()

1. we define a MongoPipeline() storage class, which defines several methods, briefly explain:

from crawler() is a class method, identified by @class method. The function of this method is mainly to get the parameters we set in

1MONGO_URL ='localhost'
4'kuan.pipelines.MongoPipeline': 300,

The open_spider() method mainly performs some initialization operations. This method will be called when the Spider is turned on.

The process_item() method is the most important method for inserting data into MongoDB.

After completing the above code, enter the following line of commands to start the crawling and storage process of the crawler. If you run on a single machine, it will take a lot of time for 6000 web pages to complete. Be patient.

1scrapy crawl kuan

Here, there are two additional points:

1. in order to reduce the pressure on the website, we'd better set a few seconds delay between each request . You can add the following lines of code at the beginning of the KuanSpider() method:

1custom_settings = {
2 "DOWNLOAD_DELAY": 3, # Delay 3s, the default is 0, that is, no delay
3 "CONCURRENT_REQUESTS_PER_DOMAIN": 8 # The default concurrency is 8 times per second, which can be reduced appropriately

2. in order to better monitor the crawler runs, it is necessary to set the output log file , it can achieve Python logging package comes with:

1import logging
3logging.basicConfig(filename='kuan.log',filemode='w',level=logging.WARNING,format='%(asctime)s %(message)s',datefmt='%Y/%m/%d %I:%M:%S %p')
4logging.warning("warn message")
5logging.error("error message")

The level parameter here indicates the warning level. The severity levels from low to high are: DEBUG <INFO <WARNING <ERROR <CRITICAL. If you don’t want to record too much content in the log file, you can set a higher level, here is set to WARNING, which means Therefore, only information above the WARNING level will be output to the log.

The datefmt parameter is added to add a specific time in front of each log, which is very useful.

Above, we have completed the capture of the entire data. With the data, we can proceed with the analysis, but before this, we need to simply clean and process the data.

3. Data cleaning processing

1. we read the data from MongoDB and convert it into a DataFrame, and then check the basic situation of the data.

 1def parse_kuan():
 2 client = pymongo.MongoClient(host='localhost', port=27017)
 3 db = client['KuAn']
 4 collection = db['KuAnItem']
 5 # Convert database data to DataFrame
 6 data = pd.DataFrame(list(collection.find()))
 7 print(data.head())
 8 print(df.shape)
 9 print(
10 print(df.describe())

From the first 5 rows of data output by data.head(), you can see that, except for the score column which is in float format, the other columns are all of the object text type.

Comment, download, follow, num_score, some of the rows in the 5 columns of data have the suffix "10,000", which needs to be removed and converted to a numeric value; the volume column has the suffix "M" and "K" respectively, for uniformity Size, you need to divide "K" by 1024 to convert to "M" volume.

The entire data has a total of 6086 rows x 8 columns, and each column has no missing values.

The df.describe() method makes basic statistics on the score column. You can see that the average score of all apps is 3.9 points (5 points system), with the lowest score of 1.6 points and the highest score of 4.8 points.

Next, we will convert the above several columns of text data into numeric data, the code implementation is as follows:

 1def data_processing(df):
 2#Process the data of 5 columns of'comment','download','follow','num_score','volume', convert the unit ten thousand to the unit 1, and then convert it to a numeric value
 3 str ='_ori'
 4 cols = ['comment','download','follow','num_score','volume']
 5 for col in cols:
 6 colori = col+str
 7 df[colori] = df[col] # Copy to keep the original column
 8 if not (col =='volume'):
 9 df[col] = clean_symbol(df,col)# Process the original column to generate a new column
10 else:
11 df[col] = clean_symbol2(df,col)# Process the original column to generate a new column
13 # Convert download separately to ten thousand units
14 df['download'] = df['download'].apply(lambda x:x/10000)
15 # Batch convert to numeric
16 df = df.apply(pd.to_numeric,errors='ignore')
18def clean_symbol(df,col):
19 # Replace the character "万" with empty
20 con = df[col].str.contains('10,000$')
21 df.loc[con,col] = pd.to_numeric(df.loc[con,col].str.replace('10,000','')) * 10000
22 df[col] = pd.to_numeric(df[col])
23 return df[col]
25def clean_symbol2(df,col):
26 # Replace the character M with empty
27 df[col] = df[col].str.replace('M$','')
28 # The volume of K is divided by 1024 to convert to M
29 con = df[col].str.contains('K$')
30 df.loc[con,col] = pd.to_numeric(df.loc[con,col].str.replace('K$',''))/1024
31 df[col] = pd.to_numeric(df[col])
32 return df[col]

Above, the conversion of several columns of text data is completed, let's check the basic situation again:

download is listed as the number of app downloads, the most downloaded app has 51.9 million times , the least is 0 (rarely), and the average number of downloads is 140,000 times; the following information can be seen:

  • The volume is listed as the App volume, the largest App reaches nearly 300M, the smallest App is almost 0, and the average volume is around 18M.
  • Comment is listed as the App rating, and the number of ratings is more than 50,000, with an average of more than 200.

Above, the basic data cleaning process has been completed, and the exploratory analysis of the data will be carried out below.

4. Data Analysis

We mainly analyze the app downloads, ratings, volume and other indicators from two dimensions: overall and classification.

4.1. Overall situation

4.1.1. Download ranking

1. let’s take a look at the download volume of the app. Many times we download an app. The download volume is a very important reference indicator. Since the download volume of most apps is relatively small, the histogram cannot see the trend, so we choose The data is segmented and discretized into a histogram. The drawing tool uses Pyecharts.

It can be seen that as many as 5,517 apps (84% of the total) have been downloaded less than 100,000, and only 20 have downloaded more than 5 million. If you want to develop a profitable app, the number of user downloads is particularly important. At one point, most apps are in an awkward situation, at least on the Kuan platform.

The code is implemented as follows:

 1from pyecharts import Bar
 2# Download distribution
 3bins = [0,10,100,500,10000]
 4group_names = ['<=100,000','100,000-1,000,000','1-5,000,000','>5,000,000']
 5cats = pd.cut(df['download'],bins,labels=group_names) # Use pd.cut() method to segment
 6cats = pd.value_counts(cats)
 7bar = Bar('Interval distribution of the number of app downloads','most app downloads are less than 100,000')
 8# bar.use_theme('macarons')
10'App quantity (a)',
11 list(cats.index),
12 list(cats.values),
13 is_label_show = True,
14 is_splitline_show = False,

Next, we look at the most downloaded App of 20 models is which:

It can be seen that the "Kuan" App here is far ahead with 50 million+ downloads, which is nearly twice the 27 million downloads of the second WeChat. Such a huge advantage is easy to understand. After all, it is its own App. If you don’t have "Kuanan" on your phone, it means that you are not a true "mobile fan". From the picture, we can also see the following information:

  • Many of the TOP 20 apps are necessary for installation, and they are relatively popular apps.
  • In the App score chart on the right, you can see that only 5 apps scored more than 4 points (5-point scale), and most of the scores were less than 3 points, or even less than 2 points. It was because of these app developers. Can't make a good app or don't want to make it at all?
  • Compared with other apps, the RE Manager and Green Guardian are very prominent. Among them, the RE Manager can still get 4.8 points (the highest score) and the volume is only a few megabytes under such a high download volume. It is really rare. What It's the "Conscience App", and that's it .

For comparison, let's take a look at the 20 apps with the least downloads.

As you can see, these apps pale in comparison with the apps with a lot of downloads above. The “Guangzhou Limited Pass” with the least downloads has only 63 downloads. .

This is not surprising. It may be that the App has not been promoted or it may be newly developed. Such a small number of downloads is not bad, and it can continue to be updated. I like these developers.

In fact, this type of App not embarrassing, really embarrassing that should be a lot of downloads, but score low to no longer low of App, gives the impression that: " I am so rotten love Za Za, you do not have the ability to use " .

4.1.2. Rating ranking

Next, let's look at the overall score of the App. Here, the score is divided into the following 4 intervals, and corresponding levels are defined for different scores.

Some interesting phenomena can be found:

  • There are very few software below 3 points, accounting for less than 10% . Among the 20 apps with the most downloads, most of the software such as WeChat, QQ, Taobao, and Alipay scored less than 3 points, which is a bit embarrassing. .
  • Mid-range, that is, the number of apps with moderate scores is the largest.
  • More than four points the number of high APP accounted for nearly half (46%), these may be the App really good, it may be due to the number of scores is too small, in order to selecting the best follow-up screening is necessary to set a certain threshold.

Next, let’s take a look at the 20 apps with the highest ratings. Many times when we download apps, we follow " Which score is high, which one download feeling of ".

As you can see, the 20 apps with the highest ratings all got 4.8 points, including: RE Manager (reappears), Pure Light Rain icon pack, etc., and some less common ones. Maybe these are good apps. However, we also need to take a look at the download volume. Their download volume is more than 10,000. With a certain download volume, the score can be considered more reliable, and we can download it and experience it with confidence.

After the above overall analysis, we have roughly found some good apps, but they are not enough, so we will subdivide and set certain filter conditions.

4.2. Classification

According to App functionality and everyday usage scenarios, the App is divided into nine major categories and then selected 20 models of best App from each category .

In order to find the best App as much as possible, you may wish to set 3 conditions here:

  • The score is not less than 4 points
  • No less than 10,000 downloads
  • Set a total score evaluation index (total score = downloads * score), and then standardized to a full score of 1000, as the App ranking reference index.

After the selection, we got the 20 apps with the highest scores in each category in turn . Most of these apps are indeed conscience software .

4.2.1. System Tools

System tools include: input method, file management, system cleaning, desktop, plug-in, lock screen, etc.

As you can see, the first place is the well-known old file manager " RE Manager ". It is only 5 M in size. In addition to the functions of ordinary file managers, the biggest feature is the ability to uninstall the app that comes with the phone. , But Root is required.

The file analyzer function of " ES File Explorer " is very powerful, which can effectively clean up the bloated phone space.

The App " A Wooden Letter " is quite awesome. Just as its software introduction "Have a lot, it’s better to have me." Open it and you will find that it provides dozens of useful functions, such as: translation, Image search, express query, make emoticons, etc.

The following " Super SU ", " Storage Cleanup ", " Lanthanum ", " MT Manager ", and " My Android Tools " are all highly recommended. In short, the apps on this list can be said to be worth entering your phone App use list.

4.2.2. Social chat

In the social chat category, " Share Weibo Client " ranks first. As a third-party client App, it naturally has advantages over the official version. For example, compared to the original 70M in size, it is only a tenth of it. It's small and small, there are almost no ads, and there are many extra powerful functions. If you like to scan Weibo, then you might as well try this "Share".

The "Immediate" app is also quite good, and further down, you can see the popular "Bullet SMS" a while ago, claiming that it will replace WeChat. It seems that it will not be possible in the short term.

You may find that common apps such as "Zhihu", "Douban", and "Jianshu" do not appear on this social list because their scores are relatively low, only 2.9 points, 3.5 points and 2.9 points respectively. If you want to use them, it is recommended to use their third-party clients or historical versions.

4.2.3. Information Reading

As you can see, in the information reading category, " Quiet Reading the World " firmly occupies the first place

In the same category, "Read More", "Book Chasing Artifact", and "WeChat Reading" have also entered the list.

In addition, if you often have a headache because you don’t know where to download e-books, you might as well try " Book Search Master " and " Lao Zi Search Book ".

4.2.4. Audiovisual Entertainment

Next is the audio-visual entertainment section. NetEase's " NetEase Cloud Music " has no pressure to occupy the top spot, a rare masterpiece.

If you love to play games, then "Adobe AIR" should give it a try.

If you are very literary, then you should like "VUE" this short video shooting app. After you create it, you can post it to the circle of friends.

The last " Haiby Music " is very good. Recently I discovered that it has a powerful function combined with Baidu Netdisk. It can automatically recognize audio files and play them.

4.2.5. Communication network

Next comes the communication network category. This category mainly includes: browser, address book, notification, email and other small categories.

Everyone of us has a browser on our mobile phone, and we use a variety of browsers. Some people use the browser that comes with the phone, and some people use big-name browsers such as Chrome and Firefox.

However, you will find that you may not have heard of the top three on the list, but they are really awesome. It is suitable to describe them as " minimalist and efficient, refreshing and fast ". Among them, " Via " and " X Browse" The volume is less than 1M. The real "Sparrow is small but complete with all internal organs" is highly recommended.

4.2.6. Photographic pictures

Taking photos and retouching images is also a commonly used function. Maybe you have your own picture management software, but here I strongly recommend the first " Quick Picture Viewer " app. It is only 3M in size and can instantly find and load tens of thousands of pictures. If you are a photo mad, use It can open many photos in seconds, and it also has functions such as hiding private photos and automatically backing up Baidu network disk. It is one of the apps I have used the longest.

4.2.7. Document writing

We often need to write and make memos on mobile phones, so we naturally need good document writing apps.

Needless to say " Evernote ", I think the most useful note summary app.

If you like to use Markdown to write, then " Pure Writing " this delicate App should be very suitable for you.

The volume is less than 3M, but it has dozens of functions such as cloud backup, long image generation, Chinese and English automatic spaces. Even so, it still maintains the simple design style. This is about two or three months, and the number of downloads from two or three million reasons soared tenfold, and behind this App is a sacrifice for a few years of continuous development and updating of spare time heavyweights , worthy of admiration.

4.2.8. Travel, transportation, shopping

In this category, the number one ranking is actually 12306. When you mention it, you will think of the wonderful verification codes, but the App here is not from the official website, but developed by a third party. The most powerful feature should be "getting tickets." If you are still grabbing tickets by posting to Moments, you might as well try it.

4.2.9. Xposed plugin

The last category is Xposed, which many people may not be familiar with, but when it comes to the red envelope grabbing and anti-withdrawal features on WeChat, many people should know it. These awesome and unusual functions use various module functions in the Xposed framework. This framework is provided by the famous XDA mobile forum abroad, and some of the so-called software cracked by XDA gods that you often hear come from this forum.

Simply put, after installing the Xposed framework, you can install some fun and interesting plug-ins. With these plug-ins, your phone can achieve more and greater functions. For example, it can remove advertisements, hack App payment functions, kill power-consuming self-starting processes, and locate virtual mobile phones.

However, using this framework and these plug-ins requires flashing and ROOT, and the threshold is a bit high.

5. Summary

  • This article uses the Scrapy framework to crawl and analyze the 6000 apps of Ku'an. When you learn Scrapy, you may feel that the program is messy to write, so you can try to use the ordinary function method to write the program completely together, and then split it into blocks. In the Scrapy project, this also helps to change the thinking from a single program to a framework, and then I will write a separate article.
  • Since the number of apps in the web version is less than in the app, there are still many useful apps that are not included, such as Chrome, MX player, Snapseed, etc. It is recommended to use the Kuan App, there are more fun things.

The above is the crawling and analysis process of the entire article. The article involves a lot of boutique software. If you are interested, you can try to download and experience it. For your convenience, I have also collected 24 boutique apps here. .


As a decentralized global technology community, the Python Chinese community has the vision of becoming a spiritual tribe of 200,000 Python Chinese developers around the world. It currently covers major mainstream media and collaboration platforms, and is closely related to Alibaba, Tencent, Baidu, Microsoft, Amazon, and open source. China, CSDN and other industry well-known companies and technology communities have established extensive connections, with tens of thousands of registered members from more than a dozen countries and regions, members from the Ministry of Public Security, the Ministry of Industry and Information Technology, Tsinghua University, Peking University, Beijing University of Posts and Telecommunications, and the People of China Government agencies, scientific research institutions, financial institutions, and well-known companies at home and abroad represented by banks, the Chinese Academy of Sciences, CICC, Huawei, BAT, Google, and Microsoft, have attracted nearly 200,000 developers on the entire platform.

Reference: After analyzing 6000 apps with Python, these conclusions are drawn-Cloud + Community-Tencent Cloud