• wordpress CMS主题:ssmay主题wordpress CMS主题:ssmay主题
  • 首页 > Linux系统运用 > 学习Python机器的实战1-KNN

    学习Python机器的实战1-KNN

    作者: 分类:Linux系统运用 点击: 80 次
    wordpress CMS主题:ssmay主题

      此篇博客记录我进入机器学习领域的第一篇博客,身为一个程序员,要想做点什么东西选择真的很多,社会上仍有很多人采用着可以通过程序替代的传统方法谋着生计,或者有很多可以用技术提高效率有市场的需求。笔者一直是一个认为技术学到够用就行的人,渴望从实际需求中提炼出业务逻辑再去学习并应用相应的技术,在这之中很容易随着需求多没有抓住重点而盲目的学习应用跟进,导致每一方面都没有特别的精通,也渐渐忘记了纯粹技术的精进带来的快感。深度学习机器学习领域火热了很久,身为正在这个行业中研究生,我想无论是因为有趣还是为了顺应时代的发展,学习它总是很赞的,因此打算持续书写博客记录自己的学习路程。也希望能为偶然看到此博客的人带来一些帮助。

      一般机器学习模型需要经过以下两步

      输入训练数据,采用算法进行训练得出训练后的模型,采用训练好的模型对输入的新数据进行预测
      而knn算法的原理非常简单,可以直接使用训练数据作为模型对输入数据进行预测,省去了训练阶段,因而非常简单适合入门。

      knn,全称k近邻算法,k是一个人为指定“超参数”,我们将训练数据 M 与输入数据 m 的距离进行排序,取出距离 m 最近的 k 个数据,然后统计这 k 个数据分别属于哪一类,哪一类出现的次数最多,那么我们预测m就属于这一类。举个不是很正规的例子。

      有一个bmi指数对应分类的字典M,现在我们输入新的bmi指数,通过K近邻算法,预测这个bmi指数对应的分类是什么

      M ={ 25:'fit',24:'fit',23:'fit',30:'overweight',20:'thin' }
      m = 22
      k = 4

      距离m最近的4组数据如下{23:'fit',24:'fit',22:'thin',25:'fit'}

      统计其中出现最多的类别,fit出现3次,thin出现一次
      所以我们预测m = 22这个bmi指数对应的分类为fit。

      接下来我们给出实际的代码,将KNN封装成scikit-learn中类似的形式。

      import numpy as np
      from math import sqrt
      from collections import Counter

      class KNNClassifier:
      def __init__(self,k):
      assert k >= 1 , "k must be valied"
      self.k = k
      self._X_train = None
      self._Y_train = None
      def fit(self,X_train,Y_train):
      # fit 返回自身,这样调用,就能赋值。
      assert X_train.shape[0] == Y_train.shape[0],"the size of X_train mast be equal to the size of Y_train"
      assert self.k <= X_train.shape[0] , "the size of X_train must be at least k."
      self._X_train = X_train
      self._Y_train = Y_train
      return self
      def predict(self,X_predict):

      assert self._X_train is not None and self._Y_train is not None,"must be fit before predict"
      assert X_predict.shape[1] == self._X_train.shape[1], "the feature number of X_predict must be equal to X_train"

      Y_predict = [self._predict(x) for x in X_predict]
      return np.array(Y_predict)
      #_私有
      def _predict(self,x):

      assert self._X_train.shape[1] == x.shape[0]

      distances = [sqrt(np.sum((x_train - x) ** 2)) for x_train in self._X_train]
      nearest = np.argsort(distances)
      topK_y = [self._Y_train[i] for i in nearest[:self.k]]
      votes = Counter(topK_y)

      return votes.most_common(1)[0][0]

      def __repr__(self):
      return "KNN(k=%d)" % self.k

      import numpy as np
      from math import sqrt
      from collections import Counter

      class KNNClassifier:
      def __init__(self,k):
      assert k >= 1 , "k must be valied"
      self.k = k
      self._X_train = None
      self._Y_train = None
      def fit(self,X_train,Y_train):
      # fit 返回自身,这样调用,就能赋值。
      assert X_train.shape[0] == Y_train.shape[0],"the size of X_train mast be equal to the size of Y_train"
      assert self.k <= X_train.shape[0] , "the size of X_train must be at least k."
      self._X_train = X_train
      self._Y_train = Y_train
      return self
      def predict(self,X_predict):

      assert self._X_train is not None and self._Y_train is not None,"must be fit before predict"
      assert X_predict.shape[1] == self._X_train.shape[1], "the feature number of X_predict must be equal to X_train"

      Y_predict = [self._predict(x) for x in X_predict]
      return np.array(Y_predict)
      #_私有
      def _predict(self,x):

      assert self._X_train.shape[1] == x.shape[0]

      distances = [sqrt(np.sum((x_train - x) ** 2)) for x_train in self._X_train]
      nearest = np.argsort(distances)
      topK_y = [self._Y_train[i] for i in nearest[:self.k]]
      votes = Counter(topK_y)

      return votes.most_common(1)[0][0]

      def __repr__(self):
      return "KNN(k=%d)" % self.k

      使用上述编写的K近邻算法。

      import numpy as np
      import matplotlib.pyplot as plt
      from kNN import *

      row_data_X = [
      [3.39,2.33],
      [3.11,1.78],
      [1.34,3.36],
      [3.58,4.68],
      [2.28,2.87],
      [7.42,4.69],
      [5.75,3.53],
      [9.17,2.51],
      [7.79,3.42],
      [7.94,0.79]
      ]
      row_data_y = [0,0,0,0,0,1,1,1,1,1]
      X_train = np.array(row_data_X)
      Y_train = np.array(row_data_y)
      x = np.array([8.09,3.37])

      X_predict = x.reshape(1,-1)

      knn_clf = KNNClassifier(k = 6)

      knn_clf.fit(X_train,Y_train)

      y_predict = knn_clf.predict(X_predict)

      print(y_predict)

      import numpy as np
      import matplotlib.pyplot as plt
      from kNN import *

      row_data_X = [
      [3.39,2.33],
      [3.11,1.78],
      [1.34,3.36],
      [3.58,4.68],
      [2.28,2.87],
      [7.42,4.69],
      [5.75,3.53],
      [9.17,2.51],
      [7.79,3.42],
      [7.94,0.79]
      ]
      row_data_y = [0,0,0,0,0,1,1,1,1,1]
      X_train = np.array(row_data_X)
      Y_train = np.array(row_data_y)
      x = np.array([8.09,3.37])

      X_predict = x.reshape(1,-1)

      knn_clf = KNNClassifier(k = 6)

      knn_clf.fit(X_train,Y_train)

      y_predict = knn_clf.predict(X_predict)

      print(y_predict)

      有问题欢迎留言讨论哈(转自:https://winjourn.cn/?p=781  )



      QQ二维码

    文章作者:steam
    本文地址:http://wanlimm.com/77201808246680.html
    版权所有 © 转载时必须以链接形式注明作者和原始出处!

    上一篇:
    下一篇:
    wordpress CMS主题:ssmay主题

    或许你会感兴趣的文章:

    发表评论

    电子邮件地址不会被公开。 必填项已用*标注

    This site uses Akismet to reduce spam. Learn how your comment data is processed.