基于自適應(yīng)松弛的魯棒模糊C均值聚類算法
doi: 10.11999/JEIT190556 cstr: 32379.14.JEIT190556
-
1.
廈門(mén)大學(xué)航空航天學(xué)院 廈門(mén) 361102
-
2.
集美大學(xué)信息工程學(xué)院 廈門(mén) 361021
Robust Fuzzy C-Means Based on Adaptive Relaxation
-
1.
College of Aeronautics and Astronautics, Xiamen University, Xiamen 361102, China
-
2.
College of Information Engineering, Jimei University, Xiamen 361021, China
-
摘要:
噪聲是影響聚類結(jié)果的最重要的因素之一,現(xiàn)有的模糊聚類算法主要通過(guò)對(duì)隸屬度約束進(jìn)行松弛的方式來(lái)降低噪聲樣本的影響。這種方式仍然存在兩個(gè)基本問(wèn)題需要解決:第一,如何評(píng)估一個(gè)樣本是噪聲的可能性;第二,如何在抑制噪聲樣本影響力的同時(shí),保留正常樣本的作用力。針對(duì)這兩問(wèn)題,該文提出了基于自適應(yīng)松弛的魯棒模糊C均值聚類算法(AR-RFCM)。新模型基于K最近鄰的方式(KNN)來(lái)估計(jì)樣本的可靠性,自適應(yīng)地調(diào)整松弛參數(shù),從而實(shí)現(xiàn)在降低噪聲樣本影響力的同時(shí),保留可靠樣本的作用力。此外,AR-RFCM利用了C均值聚類模型中隸屬度的稀疏性來(lái)提高可靠樣本的作用力,從而提高數(shù)據(jù)簇的內(nèi)聚程度,進(jìn)而降低噪聲樣本的影響。實(shí)驗(yàn)表明,AR-RFCM不僅在處理噪聲樣本時(shí)具有良好的魯棒性,同時(shí)在25個(gè)UCI 數(shù)據(jù)集實(shí)驗(yàn)中,分類正確率(蘭德指數(shù))平均高于FCM算法7.7864%。
Abstract:Noise is one of the most important influences for clustering. Existing fuzzy clustering methods try to reduce the impact of noise by relaxing the constraint condition of membership. But there are still two basic problems to be solved. The first is how to evaluate the probability that a sample point is a noise. The second is how to retain the effect of normal points while suppressing the impact of noise. To solve these two problems, Robust Fuzzy C-Means based on Adaptive Relaxation (AR-RFCM) is proposed. The new model estimates the reliability of sample points by the method of the K-Nearest Neighbor (KNN). It adjusts adaptively the relaxation parameters to reduce the impact of noise, and keeps the effect of reliable sample points at the same time. In addition, AR-RFCM utilizes the sparsity of membership in K-means to improve the effect of reliable sample points. Therefore, the compactness of clusters is improved and the impact of noise is suppressed. Experiments demonstrate that AR-RFCM has a good robustness for noise, and also achieves higher rand index in all 25 UCI data sets, even averagely higher than FCM 7.7864%.
-
Key words:
- Noise /
- Clustering /
- Fuzzy C-Means (FCM) /
- Adaptive /
- Relaxation
-
表 1 各算法求得的隸屬度值
樣本編號(hào) FCM PCM NC REFCM FDCM_SSR AR-RFCM C#1 C#2 C#1 C#2 C#1 C#2 C#1 C#2 C#1 C#2 C#1 C#2 1 0.054 0.946 0.015 0.539 0.004 0.332 0 0 0.095 0.905 0 0.984 2 0.008 0.992 0.018 0.546 0.005 0.332 0 0 0.060 0.940 0 0.985 3 0.040 0.960 0.019 0.999 0 1.000 0 0.992 0.090 0.910 0 1.000 4 0.093 0.907 0.018 0.545 0.005 0.332 0 0 0.137 0.863 0 0.985 5 0.055 0.945 0.024 0.552 0.007 0.331 0 0 0.114 0.886 0 0.985 6 0.500 0.500 0.003 0.003 0.001 0.001 0 0 0.500 0.500 0 0 7 0.500 0.500 0.010 0.010 0.004 0.004 0 0 0.500 0.500 0 0 8 0.945 0.055 0.552 0.024 0.331 0.007 0 0 0.886 0.114 0.985 0 9 0.992 0.008 0.546 0.018 0.332 0.005 0 0 0.941 0.059 0.985 0 10 0.960 0.040 0.999 0.019 1.000 0 0.992 0 0.910 0.090 1.000 0 11 0.907 0.093 0.545 0.018 0.332 0.005 0 0 0.863 0.137 0.985 0 12 0.946 0.054 0.539 0.015 0.332 0.004 0 0 0.905 0.095 0.984 0 下載: 導(dǎo)出CSV
表 2 各算法所得的聚類簇中心
簇中心1 簇中心2 FCM $\left[ \begin{aligned} & { {\rm{3} }{\rm{.9870} } } \\ & { {\rm{0} }{\rm{.0011} } } \end{aligned} \right]$ $\left[ \begin{aligned}& { {\rm{ - 3} }{\rm{.9870} } } \\ &\ \ \, { {\rm{0} }{\rm{.0011} } } \end{aligned} \right]$ PCM $\left[ \begin{aligned}& { {\rm{3} }{\rm{.9870} } } \\ & { {\rm{0} }{\rm{.0011} } } \end{aligned} \right]$ $\left[ \begin{aligned}& { {\rm{ - 3} }{\rm{.9870} } } \\ & \ \ \, { {\rm{0} }{\rm{.0011} } } \end{aligned} \right]$ NC $\left[ \begin{aligned}& { {\rm{3} }{\rm{.9996} } } \\ & { {\rm{0} }{\rm{.0002} } } \end{aligned} \right]$ $\left[ \begin{aligned}& { {\rm{ - 3} }{\rm{.9996} } } \\ & \ \ \, { {\rm{0} }{\rm{.0002} } } \end{aligned} \right]$ REFCM $\left[ \begin{aligned} & { {\rm{4} }{\rm{.000} }0} \\ & { {\rm{0} }{\rm{.0000} } } \end{aligned} \right]$ $\left[ \begin{aligned}& { {\rm{ - 4} }{\rm{.000} }0} \\ &\ \ \, { {\rm{0} }{\rm{.0000} } } \end{aligned} \right]$ FDCM_SSR $\left[ \begin{aligned}& { {\rm{3} }{\rm{.4833} } } \\ & { {\rm{1} }{\rm{.6530} } } \end{aligned} \right]$ $\left[ \begin{aligned}& { {\rm{ - 3} }{\rm{.4833} } } \\ &\ \ \, { {\rm{1} }{\rm{.6530} } } \end{aligned} \right]$ AR-RFCM $\left[ \begin{aligned}& { {\rm{3} }{\rm{.9999} } } \\ & { {\rm{0} }{\rm{.0000} } } \end{aligned} \right]$ $\left[ \begin{aligned}& { {\rm{ - 3} }{\rm{.9999} } } \\ &\ \ \, { {\rm{0} }{\rm{.0000} } } \end{aligned} \right]$ 下載: 導(dǎo)出CSV
表 3 噪聲數(shù)據(jù)集信息
均值 協(xié)方差 簇1 [1 2] $\left[ {\begin{array}{*{20}{c}}2&{0.2}\\{0.2}&2\end{array}} \right]$ 簇2 [–1 –2] $\left[ {\begin{array}{*{20}{c}}2&0\\0&1\end{array}} \right]$ 簇3 [–3 –5] $\left[ {\begin{array}{*{20}{c}}3&0\\0&1\end{array}} \right]$ 噪聲 [–3 5] $\left[ {\begin{array}{*{20}{c}}{10}&0\\0&2\end{array}} \right]$ 下載: 導(dǎo)出CSV
表 4 噪聲數(shù)據(jù)集實(shí)驗(yàn)結(jié)果
FCM PCM NC REFCM FDCM_SSR AR-RFCM 準(zhǔn)確率 0.8683 0.8683 0.8767 0.8583 0.8667 0.8833 精確度 0.6775 0.6775 0.6900 0.6625 0.6750 0.7000 靈敏度 0.9033 0.9033 0.9200 0.8833 0.9000 0.9333 特異度 0.8567 0.8567 0.8622 0.8500 0.8556 0.8667 下載: 導(dǎo)出CSV
表 5 不同聚類算法在UCI數(shù)據(jù)集的聚類結(jié)果的RI(%)
FCM PCM NC REFCM FDCM_SSR AR-RFCM Ecoli 79.66±0.47 78.60±3.28 78.25±1.11 80.12±1.13 79.83±0.45 88.40±1.62 Auto-mpg 76.23±0 75.21±4.02 77.27±0.46 75.64±0 85.62±0.30 77.64±2.09 Dermatology 83.49±1.18 81.17±3.26 69.82±6.61 81.33±5.49 84.88±1.59 92.55±0 Iris 83.68±0 79.33±7.90 82.27±4.23 85.68±0 87.37±0 85.68±0 Zoo 87.91±1.76 87.56±6.15 88.90±5.82 85.76±5.22 89.78±2.33 96.65±0 Transfusion 53.10±0 55.47±5.59 56.30±3.75 53.16±0 63.90±0.09 63.68±0 Parkinsons 52.19±0 58.86±5.86 64.40±5.31 60.27±7.74 63.12±0.57 73.83±0 Banknote 51.52±0 59.55±9.75 61.32±6.89 59.68±12.12 52.44±1.73 66.32±2.79 Creadit-approval 67.57±0.37 55.78±5.82 64.89±0 53.96±6.53 67.51±0 68.06±0 Breast-cancer 90.53±0 78.42±14.25 86.24±16.97 91.59±0 94.31±0 94.86±0 Wine 95.43±0 73.35±4.57 91.85±6.62 90.36±0 95.43±0 96.40±0.73 Automobile 71.83±0.44 70.81±1.78 72.08±1.62 71.79±0.38 71.99±0.29 74.24±0.29 Messidor Features 50.49±0 50.62±0.35 50.63±0.01 50.86±0 50.65±0.53 51.80±0.61 Fertility 50.00±0 65.74±12.93 50.08±0.18 57.99±6.55 79.13±0.71 78.85±1.75 Seeds 89.92±0 78.70±5.28 88.53±0.52 87.06±0 89.92±0 91.26±0.52 Blance 58.82±4.58 58.49±5.70 62.66±4.69 60.55±5.09 60.56±4.07 65.97±3.54 House Votes 77.52±0 71.46±8.61 75.49±6.07 72.41±6.76 77.86±0 78.90±0 Vowel 67.15±2.47 82.68±1.61 53.52±4.44 82.91±0.94 66.81±1.81 85.47±0.20 Glass 71.31±0.45 69.29±1.19 71.58±0.98 71.07±0.86 71.59±0.39 72.45±0.77 Mammographic 67.96±0 64.00±7.05 67.98±0.04 62.35±6.02 68.55±0 68.11±0 Pima Indians Diabetes 59.07±0 56.23±3.27 52.41±0.13 58.09±0 59.06±0.03 59.18±0.29 Qualitative Bankruptcy 94.53±0 74.58±16.72 97.62±0 94.53±0 94.53±0 97.62±0 Seismic Bumps 51.88±0 71.15±12.53 56.19±10.29 87.70±0 87.69±0.10 87.70±0 Phishing Data 68.72±0.29 60.73±5.18 69.13±0.28 62.71±0 68.93±0.39 69.67±0.09 Yeast 65.83±1.61 72.95±1.21 63.44±2.90 72.95±2.06 66.58±1.58 75.71±0.03 下載: 導(dǎo)出CSV
-
JAIN A K. Data clustering: 50 years beyond k-means[J]. Pattern Recognition Letters, 2010, 31(8): 651–666. doi: 10.1016/j.patrec.2009.09.011 DENG Zhaohong, JIANG Yizhang, CHUNG Fulai, et al. Transfer prototype-based fuzzy clustering[J]. IEEE Transactions on Fuzzy Systems, 2016, 24(5): 1210–1232. doi: 10.1109/TFUZZ.2015.2505330 張潔玉, 李佐勇. 基于核空間的加權(quán)鄰域約束直覺(jué)模糊聚類算法[J]. 電子與信息學(xué)報(bào), 2017, 39(9): 2162–2168. doi: 10.11999/JEIT161317ZHANG Jieyu and LI Zuoyong. Kernel-based algorithm with weighted spatial information intuitionistic fuzzy c-means[J]. Journal of Electronics &Information Technology, 2017, 39(9): 2162–2168. doi: 10.11999/JEIT161317 LV Yinghua, MA Tinghuai, TANG Meili, et al. An efficient and scalable density-based clustering algorithm for datasets with complex structures[J]. Neurocomputing, 2016, 171: 9–22. doi: 10.1016/j.neucom.2015.05.109 QIN Xiaoyu, TING Kaiming, ZHU Ye, et al. Nearest-neighbour-induced isolation similarity and its impact on density-based clustering[C]. The 33rd AAAI Conference on Artificial Intelligence, Honolulu, USA, 2019: 4755–4762. doi: 10.1609/aaai.v33i01.33014755. 趙小強(qiáng), 劉曉麗. 基于公理化模糊子集的改進(jìn)譜聚類算法[J]. 電子與信息學(xué)報(bào), 2018, 40(8): 1904–1910. doi: 10.11999/IEIT170904ZHAO Xiaoqiang and LIU Xiaoli. An improved spectral clustering algorithm based on axiomatic fuzzy set[J]. Journal of Electronics &Information Technology, 2018, 40(8): 1904–1910. doi: 10.11999/IEIT170904 YIN Hao, BENSON A R, LESKOVEC J, et al. Local higher-order graph clustering[C]. The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Canada, 2017: 555–564. doi: 10.1145/3097983.3098069. MOSLEHI Z, TAHERI M, MIRZAEI A, et al. Discriminative fuzzy c-means as a large margin unsupervised metric learning algorithm[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(6): 3534–3544. doi: 10.1109/TFUZZ.2018.2836338 劉解放, 王士同, 王駿, 等. 一種具有最優(yōu)保證特性的貝葉斯可能性聚類方法[J]. 電子與信息學(xué)報(bào), 2017, 39(7): 1554–1562. doi: 10.11999/JEIT160908LIU Jiefang, WANG Shitong, WANG Jun, et al. Bayesian possibilistic clustering method with optimality guarantees[J]. Journal of Electronics &Information Technology, 2017, 39(7): 1554–1562. doi: 10.11999/JEIT160908 MACQUEEN J. Some methods for classification and analysis of multivariate observations[C]. The 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, USA, 1965: 281–297. BEZDEK J C, EHRLICH R, and FULL W. FCM: The fuzzy c-means clustering algorithm[J]. Computers & Geosciences, 1984, 10(2/3): 191–203. DE OLIVEIRA J V and PEDRYCZ W. Advances in Fuzzy Clustering and its Applications[M]. Chichester: John Wiley & Sons, Ltd., 2007. doi: 10.1002/9780470061190. DAVE R N. Characterization and detection of noise in clustering[J]. Pattern Recognition, 1991, 12(11): 657–664. doi: 10.1016/0167-8655(91)90002-4 KRISHNAPURAM R and KELLER J M. A possibilistic approach to clustering[J]. IEEE Transactions on Fuzzy Systems, 1993, 1(2): 98–110. doi: 10.1109/91.227387 ZARINBAL M, ZARANDI M H F, and TURKSEN I B. Relative entropy fuzzy c-means clustering[J]. Information Sciences, 2014, 260: 74–97. doi: 10.1016/j.ins.2013.11.004 GU Jing, JIAO Licheng, YANG Shuyuan, et al. Fuzzy double c-means clustering based on sparse self-representation[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 612–626. doi: 10.1109/TFUZZ.2017.2686804 -