ZHOU Jun,JIANG Yu,MA Zhenming,et al.Density Peak Clustering Algorithm Combining Simulated Annealing and Multiple Allocation Strategies[J].Journal of Chengdu University of Information Technology,2022,37(04):396-400.[doi:10.16836/j.cnki.jcuit.2022.04.006]
结合模拟退火和多分配策略的密度峰值聚类算法
- Title:
- Density Peak Clustering Algorithm Combining Simulated Annealing and Multiple Allocation Strategies
- 文章编号:
- 2096-1618(2022)04-0396-05
- Keywords:
- density peak; heuristic optimization; breadth first search; density expansion; cluster nearest neighbor
- 分类号:
- TP391
- 文献标志码:
- A
- 摘要:
- 针对密度峰值聚类算法在截断距离选取存在主观性依赖和非簇中心点的分配策略易出错的问题,提出一种结合模拟退火和多分配策略的密度峰值聚类算法(SA-DPC)。首先,利用模拟退火的启发式搜索找到全局最优的截断距离,设计以标准互信息(NMI)为目标函数的参数寻优模型; 然后,从簇中心点开始以广度优先搜索的方式进行密度拓展; 最后,找出雏形簇最近邻点依次分配。8个人工合成数据集的实验结果表明,改进的算法降低了聚类效果对截断距离的敏感性,且改进算法的ACC、ARI和AMI与原算法相比,分别最高提升了约35%、90%、80%。
- Abstract:
- In view of the subjective dependence of the truncation distance selection in the density peak clustering algorithm and the error-prone problem of the allocation strategy of non-cluster center points, a density peak clustering algorithm combining simulated annealing and multiple allocation strategies(SA-DPC)is proposed. Firstly, the heuristic search of simulated annealing is used to find the global optimal truncation distance, and a parameter optimization model with standard mutual information(NMI)as the objective function is designed. Then, starting from the center of the cluster, the density is expanded in a breadth-first search method. Finally, find out the nearest neighbors of the prototype cluster and assign them sequentially. Experimental results on 8 artificial synthetic data sets show that the improved algorithm reduces the sensitivity of the clustering effect to the cutoff distance, and the improved algorithm’s ACC, ARI and AMI have increased by about 35%, 90% and 80% respectively compared with the original algorithm.
参考文献/References:
[1] 孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61.
[2] 纪守领,李进锋,杜天宇,等.机器学习模型可解释性方法、应用与安全研究综述[J].计算机研究与发展,2019,56(10):2071-2096.
[3] RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[4] SHI H,YAN J,DING M,et al.An improved fuzzy c-means soft clustering based on density peak for wind power forecasting data processing[C].Asia Energy and Electrical Engineering Symposium(AEEES),2020.
[5] 刘继新,董欣放,徐晨,等.基于密度峰值的终端区航迹聚类与异常识别[J].交通运输工程学报,2021,21(5):214-226.
[6] Vijay R K,Nanda S J.Seismicity analysis using space-time density peak clustering method[J].Pattern Analysis and Applications,2021,24:181-201.
[7] Chen J,Yu P S.A Domain Adaptive Density Clustering Algorithm for Data With Varying Density Distribution[J].IEEE Transactions on Knowledge and Data Engineering,2021,(33)6:2310-2321.
[8] Wang Y,Yang Y.Relative density-based clustering algorithm for identifying diverse density clusters effectively[J].Neural Computing and Applications,2021,33:10141-10157.
[9] 赵嘉,姚占峰,吕莉,等.基于相互邻近度的密度峰值聚类算法[J].控制与决策,2021,36(3):543-552.
[10] Fränti P,Sieranoja S.K-means properties on six clustering benchmark datasets[J].Applied Intelligence,2018,48(12),4743-4759.
[11] Gelatt M P,Vecchi S,Kirkpatrick C D.Optimization by Simulated Annealing[J].Science,1983,220(4598):671-680.
[12] McQueen J.Some methods for classification and analysis of multivariate observations[R].Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability.Los Angeles:University of California,1967:281-297.
[13] Ester M,Kriegel H P,Sander J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[R].Proceedings of ACM SIGKDD’96,Portland,1996:226-231.
[14] Frey B J,Dueck D.Clustering by passing messages between data points[J].Science,2007,315:972-976.
[15] 杨燕,靳蕃,MOHAMED K.聚类有效性评价综述[J].计算机应用研究,2008(6):1630-1632.
备注/Memo
收稿日期:2021-12-15