### Write in front

Logistic regression involves advanced mathematics, linear algebra, probability theory, and optimization problems. This article tries to use the simplest and easy-to-understand narrative method, with less formula principles and more visual cases as the principle, to explain Logistic regression to readers. If you are allergic to mathematical formulas and cause discomfort, you will be responsible for the consequences.

### Logistic regression principle and derivation

Although there is the word regression in Logistic regression, the algorithm is a classification algorithm. As shown in the figure, there are two types of data (red and green points) distributed as follows. If you need to classify the two types of data, we can use a straight line Divide (w0 x0 + w1 x1+w2 * x2). When a new sample (x1, x2) needs to be predicted, it is brought into the linear function. If the function value is greater than 0, it is a green sample (positive sample), otherwise it is a red sample (negative sample).

Extending to high-dimensional space, we need to obtain a hyperplane (a straight line in two dimensions, a plane in three dimensions, and a hyperplane in n dimension n-1) to segment our sample data, which is actually to find the hyperplane. The W parameter of the plane is very similar to regression, so it is named Logistic regression.

##### sigmoid function

Of course, we do not directly use the z function, we need to convert the z value to the interval 0-1, and the converted z value is the probability of judging that the new sample is a positive sample.

We use the sigmoid function to complete this conversion process, the formula is as follows. By observing the sigmoid function graph, as shown in the figure, when the z value is greater than 0, the σ value is greater than 0.5, and when the z value is less than 0, the σ value is less than 0.5. Using the sigmoid function, Logistic regression is essentially a discriminant model based on conditional probability.

##### Objective function

In fact, we are now looking for W. How to find W? Let’s look at the picture below. We can see that the second picture has the best straight line segmentation. In other words, we can make these sample points farther away from the straight line. Okay, so there is a good division for the arrival of new samples. How to formulate and calculate this objective function?

We apply the sigmoid formula to the z function:

The following formulas can be derived through conditional probability, and the formulas can be integrated into one, see below.

Assuming that the samples are independent of each other, the probability generated by the entire sample set is the product of the generation probabilities of all samples:

This formula is too complicated and it is not easy to derive. Here we use log conversion:

At this time, the value of this objective function is required to be the largest, so as to find θ.

Before introducing the gradient ascent method, let's look at a piece of knowledge for middle school: find the maximum value when the following function is equal to x.

Function diagram:

Solution: Find the derivative of f(x): 2x, and set it to 0. When x=0, the maximum value is 0. However, when the function is complex, it is difficult to calculate the extreme value of the function to find the derivative. At this time, we need to use the gradient ascent method to approach the extreme value step by step through iteration. The formula is as follows, we follow the direction of the derivative (gradient) step by step Approaching.

Use the gradient algorithm to calculate the x value of the function:

def f(x_old):
return -2*x_old

def cal():
x_old = 0
x_new = -6
eps = 0.01
presision = 0.00001
while abs(x_new-x_old)>presision:
x_old=x_new
x_new=x_old+eps*f(x_old)
return x_new

-0.0004892181072978443
##### Objective function solving

Here, we take the partial derivative of the function and get the iterative formula as follows:

### Logistic regression practice

###### Data situation

Read in the data and display it in a graph:

dataMat = [];labelMat = []
fr = open('Data/Logistic/TestSet.txt')
lineArr = line.strip().split()
dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])
labelMat.append(int(lineArr[2]))
return dataMat, labelMat
##### Training algorithm

Using the gradient iteration formula, calculate W:

def sigmoid(inX):
return 1.0/(1 + np.exp(-inX))

dataMatrix = np.mat(dataMatIn)
labelMat = np.mat(labelMatIn).transpose()
m,n = np.shape(dataMatrix)
alpha = 0.001
maxCycles = 500
weights = np.ones((n,1))
for k in range(maxCycles):
h = sigmoid(dataMatrix * weights)
error = labelMat-h
weights = weights + alpha * dataMatrix.transpose() * error
return weights

View the classification results by plotting the calculated weights: