The machine learning algorithms we learned before are all classification algorithms, that is, the predicted values are discrete values. When the predicted value is a continuous value, it is necessary to use a regression algorithm. This article will introduce the principle and code implementation of linear regression.
As shown in the figure, at this time a set of two-dimensional data, let's first think about how to better fit these scattered points through a straight line? To put it bluntly: try to let the fitted straight line pass through these scattered points (the points are very close to the fitted straight line).
To make these points very close to the fitted straight line, we need to use mathematical formulas to express. First of all, the straight line formula we require is: Y = XTw. What we require here is this w vector (similar to logistic regression). The error is the smallest, that is, the difference between the predicted value y and the true value y is small, we use the square error here:
What we need to do is to minimize the square error, then take the derivative of w, and finally the calculation formula for w is:
We call this method OLS, which is "Ordinary Least Squares"
We first read in the data and use the matplotlib library to display the data.
def loadDataSet(filename): numFeat = len(open(filename).readline().split('\t'))-1 dataMat = [];labelMat = [] fr = open(filename) for line in fr.readlines(): lineArr = [] curLine = line.strip().split('\t') for i in range(numFeat): lineArr.append(float(curLine[i])) dataMat.append(lineArr) labelMat.append(float(curLine[-1])) return dataMat, labelMat
Here, just find w directly, and then visualize the straight line.
def standRegres(Xarr,yarr): X = mat(Xarr);y = mat(yarr).T XTX = XT * X if linalg.det(XTX) == 0: print('cannot be inverted') return w = XTX.I * (XT*y) return w