Logistic regression involves advanced mathematics, linear algebra, probability theory, and optimization problems. This article tries to use the simplest and easy-to-understand narrative method, with less formula principles and more visual cases as the principle, to explain Logistic regression to readers. If you are allergic to mathematical formulas and cause discomfort, you will be responsible for the consequences.
Although there is the word regression in Logistic regression, the algorithm is a classification algorithm. As shown in the figure, there are two types of data (red and green points) distributed as follows. If you need to classify the two types of data, we can use a straight line Divide (w0 x0 + w1 x1+w2 * x2). When a new sample (x1, x2) needs to be predicted, it is brought into the linear function. If the function value is greater than 0, it is a green sample (positive sample), otherwise it is a red sample (negative sample).
Extending to high-dimensional space, we need to obtain a hyperplane (a straight line in two dimensions, a plane in three dimensions, and a hyperplane in n dimension n-1) to segment our sample data, which is actually to find the hyperplane. The W parameter of the plane is very similar to regression, so it is named Logistic regression.
Of course, we do not directly use the z function, we need to convert the z value to the interval 0-1, and the converted z value is the probability of judging that the new sample is a positive sample.
We use the sigmoid function to complete this conversion process, the formula is as follows. By observing the sigmoid function graph, as shown in the figure, when the z value is greater than 0, the σ value is greater than 0.5, and when the z value is less than 0, the σ value is less than 0.5. Using the sigmoid function, Logistic regression is essentially a discriminant model based on conditional probability.
In fact, we are now looking for W. How to find W? Let’s look at the picture below. We can see that the second picture has the best straight line segmentation. In other words, we can make these sample points farther away from the straight line. Okay, so there is a good division for the arrival of new samples. How to formulate and calculate this objective function?
We apply the sigmoid formula to the z function:
The following formulas can be derived through conditional probability, and the formulas can be integrated into one, see below.
Assuming that the samples are independent of each other, the probability generated by the entire sample set is the product of the generation probabilities of all samples:
This formula is too complicated and it is not easy to derive. Here we use log conversion:
At this time, the value of this objective function is required to be the largest, so as to find θ.
Before introducing the gradient ascent method, let's look at a piece of knowledge for middle school: find the maximum value when the following function is equal to x.
Function diagram:
Solution: Find the derivative of f(x): 2x, and set it to 0. When x=0, the maximum value is 0. However, when the function is complex, it is difficult to calculate the extreme value of the function to find the derivative. At this time, we need to use the gradient ascent method to approach the extreme value step by step through iteration. The formula is as follows, we follow the direction of the derivative (gradient) step by step Approaching.
Use the gradient algorithm to calculate the x value of the function:
def f(x_old): return -2*x_old def cal(): x_old = 0 x_new = -6 eps = 0.01 presision = 0.00001 while abs(x_new-x_old)>presision: x_old=x_new x_new=x_old+eps*f(x_old) return x_new -0.0004892181072978443
Here, we take the partial derivative of the function and get the iterative formula as follows:
Read in the data and display it in a graph:
def loadDataSet(): dataMat = [];labelMat = [] fr = open('Data/Logistic/TestSet.txt') for line in fr.readlines(): lineArr = line.strip().split() dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])]) labelMat.append(int(lineArr[2])) return dataMat, labelMat
Using the gradient iteration formula, calculate W:
def sigmoid(inX): return 1.0/(1 + np.exp(-inX)) def gradAscent(dataMatIn, labelMatIn): dataMatrix = np.mat(dataMatIn) labelMat = np.mat(labelMatIn).transpose() m,n = np.shape(dataMatrix) alpha = 0.001 maxCycles = 500 weights = np.ones((n,1)) for k in range(maxCycles): h = sigmoid(dataMatrix * weights) error = labelMat-h weights = weights + alpha * dataMatrix.transpose() * error return weights
View the classification results by plotting the calculated weights:
Recently, I am operating my own original official account, and articles will be published on the official account in the future. I hope readers will pay more attention and support.
Thousands of waters and mountains are always in love.