用于數(shù)據(jù)挖掘的聚類算法

姜園; 張朝陽; 仇佩亮; 周東方

用于數(shù)據(jù)挖掘的聚類算法

姜園^{①②;張朝陽},
張朝陽,
仇佩亮,
周東方

計量
- 文章訪問數(shù): 4801
- HTML全文瀏覽量: 345
- PDF下載量: 5039
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2003-12-22
- 修回日期: 2004-04-26
- 刊出日期: 2005-04-19

Clustering Algorithms Used in Data Mining

摘要

摘要: 數(shù)據(jù)挖掘用于從超大規(guī)模數(shù)據(jù)庫中提取感興趣的信息。聚類是數(shù)據(jù)挖掘的重要工具，根據(jù)數(shù)據(jù)間的相似性將數(shù)據(jù)庫分成多個類，每類中數(shù)據(jù)應盡可能相似。從機器學習的觀點來看，類相當于隱藏模式，尋找類是無監(jiān)督學習過程。目前已有應用于統(tǒng)計、模式識別、機器學習等不同領域的幾十種聚類算法。該文對數(shù)據(jù)挖掘中的聚類算法進行了歸納和分類，總結(jié)了7類算法并分析了其性能特點。
- 數(shù)據(jù)挖掘; 聚類; 分層聚類; 分割聚類; K-Means
Abstract: Data mining is used to draw interesting information from Very Large DataBases (VLDB). Clustering plays an outstanding role in data mining applications. Clustering is a division of databases into groups of similar objects based on the similarity. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning. There are tens of clustering algorithms used in various fields such as statistics, pattern recognition and machine learning now. This paper concludes the clustering algorithms used in data mining and assorts them into 7 classes. Seven types of algorithms are summarized and their performances are analyzed here.

HTML全文

參考文獻(1)

Guha S, Rastogi R, Sim K. CURE: An efficient clustering algorithm for large databases. In Proc. of the ACM SIGMOD Conference, Seattle, WA, 1998:73 - 84.[2]Karypis G, Han E H, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling.[J]. Computer.1999,32:68-[3]Boley D L. Principal direction divisive partitioning[J].Data Mining and Knowledge Discovery.1998, 2(4):325-[4]Fisher D. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 1987, 23(2): 139 - 172.[5]Mclachlan G, Krishnan T. The EM Algorithm and Extensions[J].New York, NY: John Wiley Sons.1997, http:-[6]Wallace C, Dowe D. Intrinsic classification by MML - the Snob program. In the Proc. of the 7th Australian Joint Conference on Artificial Intelligence, UNE, Armidale, Australia, World Scientific Publishing Co., 1994:37 - 44.[7]Cheeseman P, Stutz J. Bayesian classification (AutoClass): theory and results. Fayyad U M., Piatetsky-Shapiro G, Smyth P, and Uthurusamy R, (Eds.) Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press, 1996:95 - 164.[8]Fraley C, Raftery A. MCLUST: Software for model-based cluster and discriminant analysis, Tech. Report 342, Dept. Statistics,Univ. of Washington, 1999.[9]高新波,裴繼紅,謝維信.基于統(tǒng)計檢驗指導的聚類分析方法.電子科學學刊,2000,22(1):6-12.[10]邢永康,馬少平.一種基于Markov鏈模型的動態(tài)聚類方法.計算機研究與發(fā)展,2003,40(2):34-39.[11]楊岳湘,田艷芳,王韶紅.基于模糊聚類和Naive Bayes方法的文本分類器,計算機工程與科學,2002,24(5):20-23.[12]Kaufman L, Rousseeuw P. Finding Groups in Data: An Introduction to Cluster Analysis. New York, John Wiley and Sons,NY, 1990: 145- 193.[13]Ng R, Hah J. Efficient and effective clustering methods for spatial data mining. In Proc. of the 20th Conference on VLDB, Santiago,Chile, 1994:144- 155.[14]Ian Davidson. Understanding K-Means No-hierarchical Clustering.Suny Albany-Technical Report 02-2, http:∥www.cs.alb any.edu/～davidson/courses/CSI635/UnderstandingK-MeansClustering.pdf.[15]Vance Faber. Clustering and the Continuous k-Means Algorithm.Los Alamos Science Number 22 1994, http:∥www.c3. lanl.gov/～kelly/ml/pubs/1994_concept/sidebar.pdf.[16]Bradley P S, Fayyad U M. Refining initial points for k-means clustering. In Proc. of the 15th ICML, Madison, WI, 1998:91-99.[17]Aristidis Likas, Nokos Vlassis, Jakob Verbeek. The global k-means clustering algorithm, http:∥iris. usc.edu/ Vision-Notes/bibliography/pattern623.html, 2003:451 - 461.[18]Babu G P, Murty M N. A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm[J].Pattern Recogn.Lell.1993, 14(10):763-[19]Brown D, Huntley C. A practical application of simulated annealing to clustering. Technical Report IPC-TR-91-003,University of Virginia, 1991.[20]Zhang B. Generalized k-harmonic means-dynamic weighting of data in unsupervised learning. In Proc. of the 1st SIAM International Conference on Data Mining, Chicago, IL, 2001:1- 13.[21]Pelleg D, Moore A. X-means: Extending K-means with efficient estimation of the number of clusters. In Proc. 17th ICML, Stanford University, 2000:89 - 97.[22]劉健莊,謝維信,等.聚類分析的遺傳算法[J].電子學報,1995,23(11):81-83.[23]李碧,雍正正.一種改進的基于遺傳算法的聚類分析方法.電路與系統(tǒng)學報,2002,7(3):96-99.[24]劉靜,鐘偉才,劉芳,焦李成.免疫進化聚類算法.電子學報,2001,29(12A):1868-1872.[25]高新波,裴繼紅,謝維信.模糊c均值聚類算法中加權指數(shù)m的研究.電子學報,2000,28(4):1-4.[26]張志華,鄭南寧,史罡.極大熵聚類算法及其全局收斂性分析.中國科學(E輯),2001,31(1):59-70.[27]沈越泓,益曉新,徐發(fā)強,李興國.模糊聚類和模糊模式識別技術在通信設備抗干擾性能評估系統(tǒng)中的應用.電子科學學刊,2000, 22(2): 210 - 217.[28]Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. of the 2nd ACM SIGKDD, Portland, 1996:226 - 231.[29]Sander J, Ester M, Kriegel H P, Xu X. Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications[J].Data Mining and Knowledge Discovery.1998, 2(2):169-[30]Ankerst M, Breunig M, Kriegel H P, Sander J. OPTICS: Ordering points to identify clustering structure. In Proc. of the ACM SIGMOD Conference, Philadelphia, PA, 1999:49 - 60.[31]Xu X, Ester M, Kiegel H P, Sander J. A distribution-based clustering algorithm for mining in large spatial databases. In Proc.of the 14th ICDE, Orlando, FL, 1998:324 - 331.Hinneburg A, Keim D. An efficient approach to clustering large multimedia databases with noise. In Proc. of the 4th ACM SIGKDD, New York, NY, 1998:58 - 65.Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. of the ACM SIGMOD Conference, Seattle,WA, 1998:94 - 105.[32]Wang W, Yang J, Muntz R. STING: a statistical information grid approach to spatialdata mining. In Proc. of the 23rd Conference on VLDB, Athens, Greece, 1997:186 - 195.[33]Wang W, Yang J, Muntz R. STING+: An approach to active spatial data mining. In Proc. 15th ICDE, Sydney, Australia, 1999:116 - 125.[34]Sheikholeslami G, Chatterjee S, Zhang A. WaveCluster: A multi-resolution clustering approach for very large spatial databases. In Proc. of the 24th Conference on VLDB, New York,NY, 1998:428 - 439.[35]Barbara D, Chen P. Using the fractal dimension to cluster datasets.In Proc. of the 6th ACM SIGKDD, Boston, MA, 2000:260 - 264.[36]Guha S, Rastogi R, Shim K. ROCK: A robust clustering algorithm for categorical attributes. In Proc. of the 15th ICDE,Sydney, Australia, 1999:512 - 521.[37]Ertoz L, Steinbach M, Kumar V. Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data,Department of Computer Science, University of Minnesota,Minneapolis, MN, USA Technical Report, 2002, www-users.cs.umn.edu/～kumar/papers/kdd02 snn 28.pdf.[38]Ganti V, Gehrke J, Ramakrishnan R. CACTUS-clustering categorical data using summaries. In Proc. of the 5th ACM SIGKDD, San Diego, CA, 1999:73 - 83.[39]Gibson D, Kleinberg J, Raghavan P. Clustering categorical data:An approach based on dynamic systems. In Proc. of the 24thInternational Conference on Very Large Databases, New York,NY, 1998:311 - 323.[40]Cheng C, Fu A, Zhang Y. Entropy-based subspace clustering for mining numerical data. In Proc. of the 5th ACM SIGKDD, San Diego, CA, 1999:84 - 93.Hinneburg A, Keim D. Optimal grid-clustering: Towards breading the curse of dimensionality in high-dimensional clustering. In Proc. of the 25th Coference on VLDB, Edinburgh,Scotland, 1999:506 - 517.Aggarwal C C, Procopiuc C, Wolf J L, Yu P S, Park J S. Fast algorithms for projected clustering. In Proc. of the ACM SIGMOD Conference Philadelphia, PA, 1999:61 - 72,.[41]Aggarwal C C, Yu P S. Finding generalized projected clusters in high dimension spaces. In Proc. ACM SIGMOD Int. Conf. 2000,http:∥citeseer. ist.psu.edu/aggarwal00finding.html.[42]Kohonen T, The self-organizing map. Proc[J].IEEE.1990, 78(9):1464-[43]錢云濤,謝維信.一種由模糊邏輯神經(jīng)元網(wǎng)絡實現(xiàn)的聚類分析方法.西安電子科技大學學報,1995,22(1):1-7.[44]錢云濤,謝維信.聚類神經(jīng)網(wǎng)絡的通用設計方法.西安電子科技大學學報,1997,24(1):15-21.[45]黃敏超,張育林,陳啟智.模糊超球神經(jīng)網(wǎng)絡在模式聚類中的應用.自動化學報,1997,23(2):279-282.[46]魏立梅,謝維信.聚類分析中競爭學習的一種新算法.電子科學學刊,2000,22(1):13-18.[47]黃鳳崗,宋克歐.一種集成模糊聚類神經(jīng)網(wǎng)絡.哈爾濱工程大學學報,1997,18(3):82-85.[48]宋愛國,陸佶人.基于進化規(guī)劃的Kohonen網(wǎng)絡用于被動聲納目標聚類研究.電子學報,1998,26(7):128-132[49]張艷寧,趙榮椿,梁怡.一種有效的大規(guī)模數(shù)據(jù)的分類方法.電子學報,2002,30(10):1533-1535.[50]楊志榮,李磊.用SOM聚類實現(xiàn)多級高維點數(shù)據(jù)索引.計算機研究與發(fā)展,2003,40(1):100-106.[51]王莉,王正歐.TGSOM:一種用于數(shù)據(jù)聚類的動態(tài)自組織映射神經(jīng)網(wǎng)絡[J].電子與信息學報.2003,25(3):313-319瀏覽

施引文獻

資源附件(0)

訪問統(tǒng)計