历史文章:

1、Python底层实现KNN:https://blog.csdn.net/cccccyyyyy12345678/article/details/117911220

2、Python底层实现决策树:https://blog.csdn.net/cccccyyyyy12345678/article/details/118389088

3、Python底层实现贝叶斯:https://blog.csdn.net/cccccyyyyy12345678/article/details/118411638

4、Python线性回归:https://blog.csdn.net/cccccyyyyy12345678/article/details/118486796


逻辑回归虽然叫回归,但是它是分类模型。将回归转化为分类的关键是sigmoid函数。


def read_xlsx(path):     data = pd.read_excel(path)     print(data)     return data

2、归一化

因为逻辑回归也用到梯度下降,为了增加梯度下降速度,需要对数据进行消除量纲处理。

def MinMaxScaler(data):     col = data.shape[1]     for i in range(0, col-1):         arr = data.iloc[:, i]         arr = np.array(arr)         min = np.min(arr)         max = np.max(arr)         arr = (arr-min)/(max-min)         data.iloc[:, i] = arr     return data
def train_test_split(data, test_size=0.2, random_state=None):     col = data.shape[1]     x = data.iloc[:, 0:col-1]     y = data.iloc[:, -1]     x = np.array(x)     y = np.array(y)     # 设置随机种子,当随机种子非空时,将锁定随机数     if random_state:         np.random.seed(random_state)         # 将样本集的索引值进行随机打乱         # permutation随机生成0-len(data)随机序列     shuffle_indexs = np.random.permutation(len(x))     # 提取位于样本集中20%的那个索引值     test_size = int(len(x) * test_size)     # 将随机打乱的20%的索引值赋值给测试索引     test_indexs = shuffle_indexs[:test_size]     # 将随机打乱的80%的索引值赋值给训练索引     train_indexs = shuffle_indexs[test_size:]     # 根据索引提取训练集和测试集     x_train = x[train_indexs]     y_train = y[train_indexs]     x_test = x[test_indexs]     y_test = y[test_indexs]     # 将切分好的数据集返回出去     # print(y_train)     return x_train, x_test, y_train, y_test
def sigmoid(x, theta):     # 线性回归模型,中间模型,np.dot为向量点积     z = np.dot(x, theta)     h = 1/(1 + np.exp(-z))     return h

这一步很巧妙的根据sigmoid函数特性定义出损失函数。

def costFunction(h, y):     m = len(h)     J = -1/m * np.sum(y * np.log(h) + (1-y) * np.log(1-h))     return J
def gradeDesc(x,y,alpha=0.01,iter_num=2000):     m = x.shape[0]     n = x.shape[1]     xMatrix = np.mat(x)     yMatrix = np.mat(y).transpose()     J_history = np.zeros(iter_num)   # 初始化J_history, np.zero生成1行iter_num列都是0的矩阵     theta = np.ones((n, 1))    # 初始化theta, np.zero生成n行1列都是0的矩阵     # 执行梯度下降     for i in range(iter_num):         h = sigmoid(xMatrix, theta)  # sigmoid 函数         J_history[i] = costFunction(h, y)         theta = theta + alpha * xMatrix.transpose() * (yMatrix - h)  # 梯度     return J_history, theta
def score(h, y):     m = len(h)     # 定义计数变量     count = 0     for i in range(m):         if np.where(h[i] >= 0.5, 1, 0) == y[i]:             count += 1     accuracy = count/m     print("Accuracy:", accuracy)     return accuracy

线性回归的输出作为逻辑回归sigmoid函数的输入

完整代码:https://github.com/chenyi369/LogisticRegression