首页 > Linux系统运用 > 学习Python机器的实战1-KNN

学习Python机器的实战1-KNN

作者: 分类:Linux系统运用 点击: 2,505 次

    此篇博客记录我进入机器学习领域的第一篇博客,身为一个程序员,要想做点什么东西选择真的很多,社会上仍有很多人采用着可以通过程序替代的传统方法谋着生计,或者有很多可以用技术提高效率有市场的需求。笔者一直是一个认为技术学到够用就行的人,渴望从实际需求中提炼出业务逻辑再去学习并应用相应的技术,在这之中很容易随着需求多没有抓住重点而盲目的学习应用跟进,导致每一方面都没有特别的精通,也渐渐忘记了纯粹技术的精进带来的快感。深度学习机器学习领域火热了很久,身为正在这个行业中研究生,我想无论是因为有趣还是为了顺应时代的发展,学习它总是很赞的,因此打算持续书写博客记录自己的学习路程。也希望能为偶然看到此博客的人带来一些帮助。

    一般机器学习模型需要经过以下两步

    输入训练数据,采用算法进行训练得出训练后的模型,采用训练好的模型对输入的新数据进行预测
    而knn算法的原理非常简单,可以直接使用训练数据作为模型对输入数据进行预测,省去了训练阶段,因而非常简单适合入门。

    knn,全称k近邻算法,k是一个人为指定“超参数”,我们将训练数据 M 与输入数据 m 的距离进行排序,取出距离 m 最近的 k 个数据,然后统计这 k 个数据分别属于哪一类,哪一类出现的次数最多,那么我们预测m就属于这一类。举个不是很正规的例子。

    有一个bmi指数对应分类的字典M,现在我们输入新的bmi指数,通过K近邻算法,预测这个bmi指数对应的分类是什么

    M ={ 25:'fit',24:'fit',23:'fit',30:'overweight',20:'thin' }
    m = 22
    k = 4

    距离m最近的4组数据如下{23:'fit',24:'fit',22:'thin',25:'fit'}

    统计其中出现最多的类别,fit出现3次,thin出现一次
    所以我们预测m = 22这个bmi指数对应的分类为fit。

    接下来我们给出实际的代码,将KNN封装成scikit-learn中类似的形式。

    import numpy as np
    from math import sqrt
    from collections import Counter

    class KNNClassifier:
    def __init__(self,k):
    assert k >= 1 , "k must be valied"
    self.k = k
    self._X_train = None
    self._Y_train = None
    def fit(self,X_train,Y_train):
    # fit 返回自身,这样调用,就能赋值。
    assert X_train.shape[0] == Y_train.shape[0],"the size of X_train mast be equal to the size of Y_train"
    assert self.k <= X_train.shape[0] , "the size of X_train must be at least k."
    self._X_train = X_train
    self._Y_train = Y_train
    return self
    def predict(self,X_predict):

    assert self._X_train is not None and self._Y_train is not None,"must be fit before predict"
    assert X_predict.shape[1] == self._X_train.shape[1], "the feature number of X_predict must be equal to X_train"

    Y_predict = [self._predict(x) for x in X_predict]
    return np.array(Y_predict)
    #_私有
    def _predict(self,x):

    assert self._X_train.shape[1] == x.shape[0]

    distances = [sqrt(np.sum((x_train - x) ** 2)) for x_train in self._X_train]
    nearest = np.argsort(distances)
    topK_y = [self._Y_train[i] for i in nearest[:self.k]]
    votes = Counter(topK_y)

    return votes.most_common(1)[0][0]

    def __repr__(self):
    return "KNN(k=%d)" % self.k

    import numpy as np
    from math import sqrt
    from collections import Counter

    class KNNClassifier:
    def __init__(self,k):
    assert k >= 1 , "k must be valied"
    self.k = k
    self._X_train = None
    self._Y_train = None
    def fit(self,X_train,Y_train):
    # fit 返回自身,这样调用,就能赋值。
    assert X_train.shape[0] == Y_train.shape[0],"the size of X_train mast be equal to the size of Y_train"
    assert self.k <= X_train.shape[0] , "the size of X_train must be at least k."
    self._X_train = X_train
    self._Y_train = Y_train
    return self
    def predict(self,X_predict):

    assert self._X_train is not None and self._Y_train is not None,"must be fit before predict"
    assert X_predict.shape[1] == self._X_train.shape[1], "the feature number of X_predict must be equal to X_train"

    Y_predict = [self._predict(x) for x in X_predict]
    return np.array(Y_predict)
    #_私有
    def _predict(self,x):

    assert self._X_train.shape[1] == x.shape[0]

    distances = [sqrt(np.sum((x_train - x) ** 2)) for x_train in self._X_train]
    nearest = np.argsort(distances)
    topK_y = [self._Y_train[i] for i in nearest[:self.k]]
    votes = Counter(topK_y)

    return votes.most_common(1)[0][0]

    def __repr__(self):
    return "KNN(k=%d)" % self.k

    使用上述编写的K近邻算法。

    import numpy as np
    import matplotlib.pyplot as plt
    from kNN import *

    row_data_X = [
    [3.39,2.33],
    [3.11,1.78],
    [1.34,3.36],
    [3.58,4.68],
    [2.28,2.87],
    [7.42,4.69],
    [5.75,3.53],
    [9.17,2.51],
    [7.79,3.42],
    [7.94,0.79]
    ]
    row_data_y = [0,0,0,0,0,1,1,1,1,1]
    X_train = np.array(row_data_X)
    Y_train = np.array(row_data_y)
    x = np.array([8.09,3.37])

    X_predict = x.reshape(1,-1)

    knn_clf = KNNClassifier(k = 6)

    knn_clf.fit(X_train,Y_train)

    y_predict = knn_clf.predict(X_predict)

    print(y_predict)

    import numpy as np
    import matplotlib.pyplot as plt
    from kNN import *

    row_data_X = [
    [3.39,2.33],
    [3.11,1.78],
    [1.34,3.36],
    [3.58,4.68],
    [2.28,2.87],
    [7.42,4.69],
    [5.75,3.53],
    [9.17,2.51],
    [7.79,3.42],
    [7.94,0.79]
    ]
    row_data_y = [0,0,0,0,0,1,1,1,1,1]
    X_train = np.array(row_data_X)
    Y_train = np.array(row_data_y)
    x = np.array([8.09,3.37])

    X_predict = x.reshape(1,-1)

    knn_clf = KNNClassifier(k = 6)

    knn_clf.fit(X_train,Y_train)

    y_predict = knn_clf.predict(X_predict)

    print(y_predict)

    有问题欢迎留言讨论哈(转自:https://winjourn.cn/?p=781  )



文章作者:steam
本文地址:https://wanlimm.com/77201808246680.html
版权所有 © 转载时必须以链接形式注明作者和原始出处!

上一篇:
下一篇:

或许你会感兴趣的文章:

发表评论

电子邮件地址不会被公开。 必填项已用*标注

This site uses Akismet to reduce spam. Learn how your comment data is processed.