一種基于嵌入技術(shù)的異構(gòu)信息網(wǎng)絡(luò)的快速聚類算法
doi: 10.11999/JEIT150106 cstr: 32379.14.JEIT150106
-
2.
(哈爾濱工程大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院 哈爾濱 150001) ②(牡丹江師范學(xué)院計(jì)算機(jī)系 牡丹江 157012)
國家自然科學(xué)基金(61370083, 61073043, 61073041)和高等學(xué)校博士學(xué)科點(diǎn)專項(xiàng)科研基金(20112304110011, 20122304110012)
A Fast Clustering Algorithm Based on Embedding Technology for Heterogeneous Information Networks
-
2.
(Institute of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China)
The National Natural Science Foundation of China (61370083, 61073043, 61073041)
-
摘要: 異構(gòu)信息網(wǎng)絡(luò)聚類分析是當(dāng)前的熱點(diǎn)研究問題之一。利用異構(gòu)信息網(wǎng)絡(luò)的稀疏性,該文提出一種基于嵌入技術(shù)的星型模式的異構(gòu)信息網(wǎng)絡(luò)的快速聚類算法。首先從相容的角度將異構(gòu)信息網(wǎng)絡(luò)轉(zhuǎn)化為若干個(gè)相容的二部圖,使用隨機(jī)映射和一種線性時(shí)間求解程序快速計(jì)算出每個(gè)二部圖的近似通勤距離嵌入,每個(gè)嵌入都存在一個(gè)子集指示目標(biāo)數(shù)據(jù)集;然后,使用這些指示子集構(gòu)建一個(gè)通用的聚類模型;最后,將所有指示子集的類設(shè)置標(biāo)號(hào),通過計(jì)算指示同一目標(biāo)對(duì)象的指示數(shù)據(jù)與標(biāo)號(hào)相同類的中心點(diǎn)的加權(quán)距離總和,同時(shí)劃分所有的指示子集,從而快速獲得通用模型的極小值。通過理論分析及實(shí)驗(yàn)驗(yàn)證,該文算法聚類速度快,聚類準(zhǔn)確率高。
-
關(guān)鍵詞:
- 異構(gòu)信息網(wǎng)絡(luò) /
- 聚類 /
- 通勤距離 /
- 嵌入 /
- 加權(quán)距離總和
Abstract: Research on clustering heterogeneous information networks is one of the current hotspots. Taking advantages of the sparsity of heterogeneous information networks, a fast clustering algorithm based on embedding technology for heterogeneous information networks of star network schema is proposed in this paper. First, the heterogeneous information network is transformed into some compatible bipartite graphs from the point of compatible view. Then, the approximate commute distance embedding of each bipartite graph is computed via random mapping and a linear time solver, and an indicator subset in each embedding indicates the target dataset. At last, a general model is formulated via all the indicator subsets, and a minimum value of the model is derived by simultaneously clustering all of the indicator subsets using the sum of the weighted distances for all indicators for an identical target object. This proposed algorithm is effective by theory analysis and experimental verification. -
肖杰斌, 張紹武.基于隨機(jī)游走和增量相關(guān)節(jié)點(diǎn)的動(dòng)態(tài)網(wǎng)絡(luò)社團(tuán)挖掘算法[J]. 電子與信息學(xué)報(bào). 2013, 35(4): 977-981. Xiao Jie-bin and Zhang Shao-wu. An algorithm of integrating random walk and increment correlative vertexes for mining community of dynamic networks[J]. Journal of Electronics Information Technology, 2013, 35(4): 977-981. 陳季夢, 陳家俊, 劉杰, 等. 基于結(jié)構(gòu)相似度的大規(guī)模社交網(wǎng)絡(luò)聚類算法[J]. 電子與信息學(xué)報(bào). 2015, 37(2): 449-454. Chen Ji-meng, Chen Jia-jun, Liu Jie, et al.. Clustering algorithms for large-scale social networks based on structural similarity[J]. Journal of Electronics Information Technology, 2015, 37(2): 449-454. Sun Y and Han J. Mining heterogeneous information networks: principles and methodologies[J]. Proceedings of Mining Heterogeneous Information Networks: Principles and Methodologies, 2012, 3(2): 1-159. Huang Y and Gao X. Clustering on heterogeneous networks [J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2014, 4(3): 213-233. Gao B, Liu T Y, Zheng X, et al.. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering[C]. Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, 2005: 41-50. Gao B, Liu T, and Ma W-Y. Star-structured high-order heterogeneous data co-clustering based on consistent information theory[C]. Proceedings of the 6th International Conference on Data Mining (ICDM 2006), Hong Kong, 2006: 880-884. Long B, Zhang Z M, Wu X, et al.. Spectral clustering for multi-type relational data[C]. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, 2006: 585-592. Sun Y, Yu Y, and Han J. Ranking-based clustering of heterogeneous information networks with star network schema[C]. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, 2009: 797-806. Li P, Wen J, and Li X. SNTClus: a novel service clustering algorithm based on network analysis and service tags[J]. Przeglad Elektrotechniczny, 2013, 89(1): 208-210. Li P, Chen L, Li X, et al.. RNRank: Network-Based Ranking on Relational Tuples[M]. Boston: Behavior and Social Computing, Springer International Publishing, 2013: 139-150. Wang R, Shi C, Philip S Y, et al.. Integrating Clustering and Ranking on Hybrid Heterogeneous Information Network[M]. Berlin: Advances in Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, 2013: 583-594. Boden B, Ester M, and Seidl T. Density-Based Subspace Clustering in Heterogeneous Networks[M]. Berlin: Machine Learning and Knowledge Discovery in Databases, Springer Berlin Heidelberg, 2014: 149-164. Meng Q, Tafavogh S, and Kennedy P J. Community detection on heterogeneous networks by multiple semantic- path clustering[C]. 2014 6th IEEE International Conference on Computational Aspects of Social Networks (CASoN), Porto, 2014: 7-12. Meng X, Shi C, Li Y, et al.. Relevance Measure in Large-scale Heterogeneous Networks[M]. Boston: Web Technologies and Applications, Springer International Publishing, 2014: 636-643. Aggarwal C C, Xie Y, and Philip S Y. On dynamic link inference in heterogeneous networks[C]. SIAM International Conference on Data?Mining, Anaheim, 2012: 415-426. Khoa N L D and Chawla S. Large Scale Spectral Clustering Using Resistance Distance and Spielman-teng Solvers[M]. Berlin: Discovery Science, Springer Berlin Heidelberg, 2012: 7-21. Spielman D A and Teng S H. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems[C]. Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, 2004: 81-90. Spielman D A and Teng S H. Nearly linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems[J]. SIAM Journal on Matrix Analysis and Applications, 2014, 35(3): 835-885. Fouss F, Pirotte A, Renders J M, et al.. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 355-369. Spielman D A and Srivastava N. Graph sparsification by effective resistances[J]. SIAM Journal on Computing, 2011, 40(6): 1913-1926. Achlioptas D. Database-friendly random projections[C]. Proceedings of the 20th ACM Sigmod-Sigact-Sigart Symposium on Principles of Database Systems, New York, 2001: 274-281. Koutis I, Miller G L, and Tolliver D. Combinatorial preconditioners and multilevel solvers for problems in computer vision and image processing[J]. Computer Vision and Image Understanding, 2011, 115(12): 1638-1646. -
計(jì)量
- 文章訪問數(shù): 1329
- HTML全文瀏覽量: 133
- PDF下載量: 596
- 被引次數(shù): 0