The decision tree algorithm can be said to be close to our lives. Through the following cases, you will find that we are using the decision tree algorithm intentionally or unintentionally every day (it looks amazing).
Xiao Ming has to go to school every morning. You can walk, take the bus, and take the car of Uncle Wang next door. At this time, Xiao Ming began to make a decision: first look at the weather and choose to walk to school when it is not raining; when it rains, look at whether the next door Uncle Wang is free, and take Pharaoh’s car to school when he is free Choose to go to school by bus. as the picture shows.
Through the above case, the decision tree can be defined: the above picture is the decision tree (um... a bit perfunctory). The decision tree is composed of nodes and directed edges. There are two types of nodes: internal nodes and leaf nodes. The internal nodes represent a feature or attribute (weather, availability), and the leaf nodes represent a class (walking, taking the bus, and taking the car of the next door Uncle Wang).
So how to construct this tree through the decision tree algorithm? (Is it the hand of God?) The above case is relatively simple. We now propose a very classic case. As shown in the figure, do we first make decisions based on weather, humidity or wind level? Entropy and information gain are to be proposed here.
1. let's look at what entropy is. Simply put, entropy describes the degree of confusion (or uncertainty) of things. For example: Chinese football enters the World Cup, the uncertainty may be 0, so the entropy may be 0; the uncertainty of the 6-sided dice is lower than that of the 12-sided dice, so the entropy will be lower. Now let’s look at the formula of entropy: H = -∑<sup>n</sup><sub>i=1</sub>p(xi)log<sub>2</sub>p(xi)
The entropy of the six dice: 1/6*log<sub>2</sub> 6 times of 1/6, which is 2.585
By analogy, the entropy of the 12 faces is: 3.585
Finally, we calculate the information entropy of this case: not playing ball is 5, playing ball is 9, so the entropy is calculated as:
-(5/14 * log(5/14,2) + 9/14 * log(9/14,2)) 0.940
According to which feature should we divide the data set first? We have a principle, which is to make disordered data into order, in other words, to make the entropy smaller, the smaller the better. The information gain is the change in entropy before and after dividing the data set. Here, the larger the information gain, the better.
Let's take the weather as an example to calculate the information gain after division:
The data is for judging whether it is a fish, there are two characteristics:
def choose(dataSet): numfeature = len(dataSet[0])-1 baseEntropy = calcshannon(dataSet) bestinfogain = 0.0 bestfeature = -1 for i in range(numfeature): featlist = [example[i] for example in dataSet] vals = set(featlist) newEntropy = 0.0 for value in vals: subDataSet = splitDataSet(dataSet, i, value) prob = len(subDataSet)/float(len(dataSet)) newEntropy += prob*calcshannon(subDataSet) infoGain = baseEntropy-newEntropy if (infoGain> bestinfogain): bestinfogain = infoGain bestfeature = i return bestfeature
When all the features are used and the data cannot be divided thoroughly, majority voting is needed to determine the classification of the leaf nodes. The code is as follows, similar to the sorting in the KNN in the previous article.
import operator def majority(classList): classCount = {} for vote in classList: if vote not in classCount.keys(): classCount[vote] = 0 classCount[vote] += 1 sortedcount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True) return sortedcount[0][0]
Finally, the code to create the tree:
def createTree(dataSet, labels): classList = [example[-1] for example in dataSet] if classList.count(classList[0]) == len(classList): return classList[0] if len(dataSet[0]) == 1: return majority(classList) bestFeat = choose(dataSet) bestFeatLabel = labels[bestFeat] myTree = {bestFeatLabel:{}} del (labels[bestFeat]) Vals = [example[bestFeat] for example in dataSet] uvals = set(Vals) for value in uvals: sublabels = labels[:] myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), sublabels) return myTree
There are two conditions for terminating recursion: one is that all categories can be divided correctly, and the other is that the feature is used.