具有隱私保護(hù)功能的知識遷移聚類算法
doi: 10.11999/JEIT150645 cstr: 32379.14.JEIT150645
基金項目:
國家自然科學(xué)基金(61272210),江蘇省杰出青年基金(BK20140001),江蘇省自然科學(xué)基金(BK20130155)
Knowledge Transfer Clustering Algorithm with Privacy Protection
Funds:
The National Natural Science Foundation of China (61272210), Jiangsu Province Outstanding Youth Fund (BK20140001), Natural Science Foundation of Jiangsu Province (BK20130155)
-
摘要: 傳統(tǒng)聚類算法在數(shù)據(jù)量不足或數(shù)據(jù)被污染的場景下聚類效果較差,針對此問題,在經(jīng)典模糊C均值(FCM)技術(shù)的基礎(chǔ)上,該文提出融合歷史類中心和歷史隸屬度兩類知識遷移機(jī)制的聚類算法。該算法通過有效利用歷史數(shù)據(jù)中總結(jié)得到的輔助知識來指導(dǎo)當(dāng)前由于數(shù)據(jù)不足或數(shù)據(jù)污染帶來的聚類困難問題,從而提高聚類效果。同時,由于該算法僅利用歷史數(shù)據(jù)的類中心和隸屬度,對歷史數(shù)據(jù)具有隱私保護(hù)的優(yōu)點。通過在模擬數(shù)據(jù)集和真實數(shù)據(jù)集上的仿真實驗,證明了該算法的有效性。
-
關(guān)鍵詞:
- 知識遷移 /
- 隱私保護(hù) /
- 聚類算法 /
- 模糊C均值(FCM)
Abstract: For the traditional clustering algorithms efficiency problems in situations of insufficient datasets or datasets with noises, a Knowledge Transfer Clustering Algorithm with Privacy Protection (KTCAPP) is proposed based on the classical Fuzzy C-Means (FCM) technology by leveraging two kinds of knowledge which are the historical class center and the historical class membership. The performance of KTCAPP is enhanced by using auxiliary knowledge from history datasets to guide the current clustering task with insufficient datasets or datasets with noises. In addition, KTCAPP is of good capability of privacy protection because the algorithm only uses the historical class center and the historical class membership which do not expose the raw data. Experiment results show the proposed algorithm is efficient.-
Key words:
- Knowledge transfer /
- Privacy protection /
- Clustering algorithm /
- Fuzzy C-Means (FCM)
-
FERRARI D G and CASTRO L N. Clustering algorithm selection by meta-learning systems: A new distance-based problem characterization and ranking combination methods [J]. Information Sciences, 2015, 301(1): 181-194. doi: 10.1016/j.ins.2014.12.044. TZORTZIS G and LIKAS A. The minmax k-means clustering algorithm[J]. Pattern Recognition, 2014, 47(7): 2505-2516. doi: 10.1016/j.patcog.2014.01.015. 孫吉貴, 劉杰, 趙連宇. 聚類算法研究[J]. 軟件學(xué)報, 2008, 19(1): 48-61. doi: 10.3724/SP.J.1001.2008.00048. SUN J G, LIU J, and ZHAO L Y. Clustering algorithms research[J]. Journal of Software, 2008, 19(1): 48-61. doi: 10.3724/SP.J.1001.2008.00048. 鄧趙紅, 張江濱, 蔣亦樟, 等. 基于模糊子空間聚類的0階L2型TSK模糊系統(tǒng)[J]. 電子與信息學(xué)報, 2015, 37(9): 2082-2088. doi: 10.11999/JEIT150074. DENG Z H, ZHANG J B, JIANG Y Z, et al. Fuzzy subspace clustering based zero-order L2-norm TSK fuzzy system[J]. Journal of Electronics Information Technology, 2015, 37(9): 2082-2088. doi: 10.11999/JEIT150074. POPAT S K and EMMANUEL M. Review and comparative study of clustering techniques[J]. International Journal of Computer Science and Information Technologies, 2014, 5(1): 805-812. BOUGUETTAYA A, YU Q, LIU X, et al. Efficient agglomerative hierarchical clustering[J]. Expert Systems with Applications, 2015, 42(5): 2785-2797. doi: 10.1016/j.eswa. 2014.09.054. ZHU L, CHUNG F L, and WANG S T. Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions[J]. IEEE Transactions on System, Man and Cybernetics, 2009, 39(3): 578-591. doi: 10.1109/TSMCB. 2008.2004818. DUNN J C. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters[J]. Journal of Cybernetics, 1973, 3(3): 32-57. doi: 10.1080/ 01969727308546046. BEZDEK J C. Pattern Recognition with Fuzzy Objective Function Algorithms[M]. New York, Plenum Press, 1981: 43-93. 趙鳳, 劉漢強(qiáng), 范九倫. 基于互補(bǔ)空間信息的多目標(biāo)進(jìn)化聚類圖像分割[J]. 電子與信息學(xué)報, 2015, 37(3): 672-678. doi: 10.11999/JEIT140371. ZHAO F, LIU H Q, and FAN J L. Multi-objective evolutionary clustering with complementary spatial information for image segmentation[J]. Journal of Electronics Information Technology, 2015, 37(3): 672-678. doi: 10.11999/JEIT140371. 趙雪梅, 李玉, 趙泉華. 結(jié)合高斯回歸模型和隱馬爾可夫隨機(jī)場的模糊聚類圖像分割[J]. 電子與信息學(xué)報, 2014, 26(11): 2730-2736. doi: 10.3724/SP.J.1146.2013.01751. ZHAO X M, LI Y, and ZHAO Q H. Image segmentation by fuzzy clustering algorithm combining hidden Markov random field and Gaussian regression model[J]. Journal of Electronics Information Technology, 2014, 26(11): 2730-2736. doi: 10.3724/SP.J.1146.2013.01751. KIM Y H, SHIM K, KIM M S, et al. DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce[J]. Information Systems, 2014, 42(1): 15-35. doi: 10.1016/j.is.2013.11.002. AGRAWAL A S and BOJEWWAR S. Comparative study of various clustering techniques[J]. International Journal of Computer Science and Mobile Computing, 2014, 3(10): 497-504. SHAO L, ZHU F, and LI X. Transfer learning for visual categorization: a survey[J]. Neural Networks and Learning, 2014, 26(5): 1019-1034. doi: 10.1109/TNNLS.2014.2330900. LU J, BEHBOOD V, HAO P, et al. Transfer learning using computational intelligence: A survey[J]. Knowledge-based Systems, 2015, 80(1): 14-23. doi: 10.1016/j.knosys.2015. 01.010. LONG M S, WANG J M, DING G G, et al. Transfer learning with graph co-regularization[J]. Knowledge and Data Engineering, 2014, 26(7): 1805-1818. doi: 10.1109/TKDE. 2013.97. DAI W Y, XUE G R, YANG Q, et al. Co-clustering based classification for out-of-domain document[C]. The 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, 2007: 210-219. doi: 10.1145/1281192.1281218. GU Q and ZHOU J. Learning the shared subspace for multi- task clustering and transductive transfer classification[C]. The 2009 Ninth IEEE International Conference on Data Mining, IEEE, Washington DC, USA, 2009: 159-168. doi: 10.1109/ICDM.2009.32. YANG Q, CHEN Y Q, XUE G R, et al. Heterogeneous transfer learning for image clustering via the social web[C]. Proceeding of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Suntec, Singapore, 2009: 1-9. XUE G R, DAI W Y, YANG Q, et al. Topic-bridged PLSA for cross-domain text classification[C]. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, ACM, 2008: 627-634. doi: 10.1145/1390334. 1390441. MOHAMMAD K S and SHAMS N. Analysis of KDD CUP 99 dataset using clustering based data mining[J]. International Journal of Database Theory and Application, 2013, 6(5): 23-34. GU Q and ZHOU J. Co-clustering on manifolds[C]. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, 2009: 359-368. doi: 10.1145/1557019.1557063. DAI W Y, YANG Q, XUE G R, et al. Self-taught clustering [C]. Proceeding of the 25th International Conference on Machine Learning, ACM, New York, NY, USA, 2008: 200-207. doi: 10.1145/1390156.1390182. JING L, NG K M, and HUANG Z. An entropy weighting K-means algorithm for subspace clustering of high- dimensional sparse data[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(8): 1026-1041. doi: 10.1109/ TKDE.2007.1048. LIU J, MOHAMMED J, CARTER J, et al. Distance-based clustering of CGH data[J]. Bioinformatics, 2006, 22(16): 1971-1978. doi: 10.1093/bioinformatics/btl185. -
計量
- 文章訪問數(shù): 1325
- HTML全文瀏覽量: 140
- PDF下載量: 651
- 被引次數(shù): 0