python實現k-means算法
聚類屬於無監督學習,K-means
算法是很典型的基於距離的聚類算法,采用距離作為相似性的評價指標,即認為兩個對象的距離越近,其相似度就越大。該算法認為簇是由距離靠近的對象組成的,因此把得到緊湊且獨立的簇作為最終目標。
下面來看看python實現k-means算法的詳細代碼吧:
# -*- coding:utf-8 -*- import random import numpy as np from matplotlib import pyplot class K_Means(object): # k是分組數;tolerance‘中心點誤差';max_iter是迭代次數 def __init__(self, k=2, tolerance=0.0001, max_iter=300): self.k_ = k self.tolerance_ = tolerance self.max_iter_ = max_iter def fit(self, data): self.centers_ = {} for i in range(self.k_): self.centers_[i] = data[random.randint(0,len(data))] # print('center', self.centers_) for i in range(self.max_iter_): self.clf_ = {} #用於裝歸屬到每個類中的點[k,len(data)] for i in range(self.k_): self.clf_[i] = [] # print("質點:",self.centers_) for feature in data: distances = [] #裝中心點到每個點的距離[k] for center in self.centers_: # 歐拉距離 distances.append(np.linalg.norm(feature - self.centers_[center])) classification = distances.index(min(distances)) self.clf_[classification].append(feature) # print("分組情況:",self.clf_) prev_centers = dict(self.centers_) for c in self.clf_: self.centers_[c] = np.average(self.clf_[c], axis=0) # '中心點'是否在誤差范圍 optimized = True for center in self.centers_: org_centers = prev_centers[center] cur_centers = self.centers_[center] if np.sum((cur_centers - org_centers) / org_centers * 100.0) > self.tolerance_: optimized = False if optimized: break def predict(self, p_data): distances = [np.linalg.norm(p_data - self.centers_[center]) for center in self.centers_] index = distances.index(min(distances)) return index if __name__ == '__main__': x = np.array([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]]) k_means = K_Means(k=2) k_means.fit(x) for center in k_means.centers_: pyplot.scatter(k_means.centers_[center][0], k_means.centers_[center][1], marker='*', s=150) for cat in k_means.clf_: for point in k_means.clf_[cat]: pyplot.scatter(point[0], point[1], c=('r' if cat == 0 else 'b')) predict = [[2, 1], [6, 9]] for feature in predict: cat = k_means.predict(feature) pyplot.scatter(feature[0], feature[1], c=('r' if cat == 0 else 'b'), marker='x') pyplot.show()
到此這篇關於python實現k-means算法的文章就介紹到這瞭,更多相關python實現k-means算法內容請搜索WalkonNet以前的文章或繼續瀏覽下面的相關文章希望大傢以後多多支持WalkonNet!
推薦閱讀:
- python中k-means和k-means++原理及實現
- Python sklearn中的K-Means聚類使用方法淺析
- python中K-means算法基礎知識點
- 人工智能——K-Means聚類算法及Python實現
- Python+OpenCV實戰之利用 K-Means 聚類進行色彩量化