Linear regression in actual machine learning

Linear regression in actual machine learning

The machine learning algorithms we learned before are all classification algorithms, that is, the predicted values ​​are discrete values. When the predicted value is a continuous value, it is necessary to use a regression algorithm. This article will introduce the principle and code implementation of linear regression.

Principle and Derivation of Linear Regression

As shown in the figure, at this time a set of two-dimensional data, let's first think about how to better fit these scattered points through a straight line? To put it bluntly: try to let the fitted straight line pass through these scattered points (the points are very close to the fitted straight line).

Objective function

To make these points very close to the fitted straight line, we need to use mathematical formulas to express. First of all, the straight line formula we require is: Y = XTw. What we require here is this w vector (similar to logistic regression). The error is the smallest, that is, the difference between the predicted value y and the true value y is small, we use the square error here:

Solve

What we need to do is to minimize the square error, then take the derivative of w, and finally the calculation formula for w is:

We call this method OLS, which is "Ordinary Least Squares"

Linear regression practice

Data situation

We first read in the data and use the matplotlib library to display the data.

def loadDataSet(filename):
    numFeat = len(open(filename).readline().split('\t'))-1
    dataMat = [];labelMat = []
    fr = open(filename)
    for line in fr.readlines():
        lineArr = []
        curLine = line.strip().split('\t')
        for i in range(numFeat):
            lineArr.append(float(curLine[i]))
        dataMat.append(lineArr)
        labelMat.append(float(curLine[-1]))
    return dataMat, labelMat
Regression algorithm

Here, just find w directly, and then visualize the straight line.

def standRegres(Xarr,yarr):
    X = mat(Xarr);y = mat(yarr).T
    XTX = XT * X
    if linalg.det(XTX) == 0:
        print('cannot be inverted')
        return
    w = XTX.I * (XT*y)
    return w

Algorithm advantages and disadvantages

  • Advantages: easy to understand and calculate
  • Disadvantages: low precision
Reference: https://cloud.tencent.com/developer/article/1155548 Linear regression in actual machine learning-Cloud + Community-Tencent Cloud