一種基于GA的混合屬性特征大數(shù)據(jù)集聚類算法

李潔; 高新波; 焦李成

一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名

郵箱

手機(jī)號碼

標(biāo)題

留言內(nèi)容

驗證碼

一種基于GA的混合屬性特征大數(shù)據(jù)集聚類算法

李潔, 高新波, 焦李成

文章導(dǎo)航 > 電子與信息學(xué)報 > 2004 > 26(8): 1203-1209

李潔, 高新波, 焦李成. 一種基于GA的混合屬性特征大數(shù)據(jù)集聚類算法[J]. 電子與信息學(xué)報, 2004, 26(8): 1203-1209.

引用本文:

李潔, 高新波, 焦李成. 一種基于GA的混合屬性特征大數(shù)據(jù)集聚類算法[J]. 電子與信息學(xué)報, 2004, 26(8): 1203-1209.

Li Jie, Gao Xin-bo, Jiao Li-cheng. A GA-Based Clustering Algorithm for Large Data Sets with Mixed Numerical and Categorical Values[J]. Journal of Electronics & Information Technology, 2004, 26(8): 1203-1209.

Citation:

Li Jie, Gao Xin-bo, Jiao Li-cheng. A GA-Based Clustering Algorithm for Large Data Sets with Mixed Numerical and Categorical Values[J]. Journal of Electronics & Information Technology, 2004, 26(8): 1203-1209.

李潔, 高新波, 焦李成. 一種基于GA的混合屬性特征大數(shù)據(jù)集聚類算法[J]. 電子與信息學(xué)報, 2004, 26(8): 1203-1209.

引用本文:

李潔, 高新波, 焦李成. 一種基于GA的混合屬性特征大數(shù)據(jù)集聚類算法[J]. 電子與信息學(xué)報, 2004, 26(8): 1203-1209.

Citation:

一種基于GA的混合屬性特征大數(shù)據(jù)集聚類算法

計量
- 文章訪問數(shù): 2947
- HTML全文瀏覽量: 127
- PDF下載量: 980
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2003-03-27
- 修回日期: 2003-07-08
- 刊出日期: 2004-08-19

A GA-Based Clustering Algorithm for Large Data Sets with Mixed Numerical and Categorical Values

摘要

摘要: 在數(shù)據(jù)挖掘中，經(jīng)常會遇到和分析大量具有數(shù)值和類屬特征的數(shù)據(jù)。然而，現(xiàn)有的大多數(shù)算法只能單獨處理數(shù)值特征數(shù)據(jù)或類屬特征數(shù)據(jù)，而不能分析具有混合屬性的數(shù)據(jù)。為此，該文提出了一種基于GA的模糊聚類新算法，通過改進(jìn)聚類目標(biāo)函數(shù)將數(shù)值特征與類屬特征相結(jié)合，從而實現(xiàn)具有混合屬性特征數(shù)據(jù)的聚類分析；通過引入GA算法能夠快速得到全局最優(yōu)解，而且不依賴于原型初始化。實驗結(jié)果表明，基于GA的新聚類算法對于處理具有混合特征的大數(shù)據(jù)集聚類問題是相當(dāng)有效的。
- 聚類分析; 數(shù)值特征; 類屬特征; 遺傳算法
Abstract: In the field of data mining, it is often encountered to perform cluster analysis on large data sets with mixed numerical and categorical values. However, most existing clustering algorithms are only efficient for the numerical data rather than the mixed data set. For this purpose, this paper presents a novel clustering algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The Genetic Algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. Experimental result illustrates that the GA-based new clustering algorithm is feasible for the large data sets with mixed numerical and categorical values.

HTML全文

參考文獻(xiàn)(1)

Klosgen W,Zytkow J M.Knowledge Discovery in Databases Terminology.Advances in Knowledge Discovery and Data Mining,Fayyad U M,Piatetsky-Shapiro G,Smyth P,Uthurusamy R.(Eds.),AAAI Press/The MIT Press,MA,1996:573-592.[2]Cormack R M.A review of classification[J].J.Roy.Statist.Soc.Series A.1971,134:321-367[3]IBM.Data Management Solutions.IBM White Paper,IBM Corp.1996.[4]Anderberg M B.Cluster Analysis for Applications.New York:Academic Press.1973:79-90.[5]Kaufman L,Rousseeuw P J.Finding Groups in Data-An Introduction to Cluster Analysis.New York:John Wiley,1990:98-110.[6]Everitt B.Cluster Analysis.New York:Heinemann Educational Books Ltd.,1974:45-60.[7]Huang Zhexue,Michael K N.A fuzzy k-modes algorithm for clustering categorical data[J].IEEE Trans.on Fuzzy Systems.1999,7(4):446-452[8]Zhexue Huang.A fast clustering algorithm to cluster very large categorical data sets in data mining.Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery,Dept.of Computer Science,The University of British Columbia,Canada,1997:1-8.[9]Holland J H.Adoption in Natural and Artificial System.Ann Arbor,MI:Univ.Mich.Press,1975:83-90.[10]Krovi R.Genetic algorithm for clustering:A preliminary investigation.Proceedings of the 25th Hawaii International Conf.on System Sciences,4,Information Systems,Hawaii,1992:504-544.

相關(guān)文章

施引文獻(xiàn)

資源附件(0)

訪問統(tǒng)計