最小化類內(nèi)距離和分類算法
doi: 10.11999/JEIT150633 cstr: 32379.14.JEIT150633
-
2.
(江南大學數(shù)字媒體學院 無錫 214122) ②(江蘇省信息融合軟件工程技術(shù)研發(fā)中心 江陰 214405)
基金項目:
國家自然科學基金(61170122, 61272210)
Intraclass-Distance-Sum-Minimization Based Classification Algorithm
-
2.
(School of Digital Media, Jiangnan University, Wuxi 214122, China)
Funds:
The National Natural Science Foundation of China (61170122, 61272210)
-
摘要: 支持向量機分類算法引入懲罰因子來調(diào)節(jié)過擬合和線性不可分時無解的問題,優(yōu)點是可以通過調(diào)節(jié)參數(shù)取得最優(yōu)解,但帶來的問題是允許一部分樣本錯分。錯分的樣本在分類間隔之間失去了約束,導致兩類交界處樣本雜亂分布,并且增加了訓練的負擔。為了解決上述問題,該文根據(jù)大間隔分類思想,基于類內(nèi)緊密類間松散的原則,提出一種新的分類算法,稱之為最小化類內(nèi)距離和(Intraclass-Distance-Sum-Minimization, IDSM)分類算法。該算法根據(jù)最小化類內(nèi)距離和準則構(gòu)造訓練模型,通過解析法求解得到最佳的映射法則,進而利用該最佳映射法則對樣本進行投影變換以達到類內(nèi)間隔小類間間隔大的效果。相應(yīng)地,為解決高維樣本分類問題,進一步提出了該文算法的核化版本。在大量UCI數(shù)據(jù)集和Yale大學人臉數(shù)據(jù)庫上的實驗結(jié)果表明了該文算法的優(yōu)越性。
-
關(guān)鍵詞:
- 支持向量機 /
- 懲罰因子 /
- 大間隔分類思想 /
- 類內(nèi)距離和 /
- 映射法則
Abstract: Classification algorithm of Support Vector Machine (SVM) is introduced the penalty factor to adjust the overfit and nonlinear problem. The method is beneficial for seeking the optimal solution by allowing a part of samples error classified. But it also causes a problem that error classified samples distribute disorderedly and increase the burden of training. In order to solve the above problems, according to large margin classification thought, based on principles that the intraclass samples must be closer and the interclass samples must be looser, this research proposes a new classification algorithm called Intraclass-Distance-Sum-Minimization (IDSM) based classification algorithm. This algorithm constructs a training model by using principle of minimizing the sum of the intraclass distance and finds the optimal projection rule by analytical method. And then the optimal projection rule is used to samples projection transformation to achieve the effect that intraclass intervals are small and the interclass intervals are large. Accordingly, this research offers a kernel version of the algorithm to solve high-dimensional classification problems. Experiment results on a large number of UCI datasets and the Yale face database indicate the superiority of the proposed algorithm. -
QUINLAN J R. Induction of decision trees[J]. Machine Learning, 1986, 1(1): 81-106. QUINLAN J R. Improved use of continuous attributes in C4.5[J]. Journal of Artificial Intelligence Research, 1996, 4(1): 77-90. PENG F, SCHUURMANS D, and WANG S. Augmenting naive Bayes classifiers with statistical language models[J]. Information Retrieval, 2004, 7(3/4): 317-345. CHENG J and GREINER R. Comparing Bayesian network classifiers[C]. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, San Francisco, USA, 1999: 101-108. COVER T and HART P. Nearest neighbor pattern classification [J]. IEEE Transactions on Information Theory, 1967, 13(1): 21-27. BIJALWAN V, KUMAR V, KUMARI P, et al. KNN based machine learning approach for text and document mining[J]. International Journal of Database Theory and Application, 2014, 7(1): 61-70. 黃劍華, 丁建睿, 劉家鋒, 等. 基于局部加權(quán)的Citation-kNN算法[J]. 電子與信息學報, 2013, 35(3): 627-632. HUANG Jianhua, DING Jianrui, LIU Jiafeng, et al. Citation- kNN algorithm based on locally-weighting[J]. Journal of Electronics Information Technology, 2013, 35(3): 627-632. WELLING M. Fisher linear discriminant analysis[J]. Department of Computer Science, 2008, 16(94): 237-280. FUIN N, PEDEMONTE S, ARRIDGE S, et al. Efficient determination of the uncertainty for the optimization of SPECT system design: a subsampled fisher information matrix[J]. IEEE Transactions on Medical Imaging, 2014, 33(3): 618-635. DUFRENOIS F. A one-class kernel fisher criterion for outlier detection[J]. IEEE Transactions on Neural Networks Learning Systems, 2014, 26(5): 982-994. VAN Ooyen A and NIENHUIS B. Improving the convergence of the back-propagation algorithm[J]. Neural Networks, 1992, 5(3): 465-471. 潘舟浩, 李道京, 劉波, 等. 基于BP算法和時變基線的機載InSAR數(shù)據(jù)處理方法研究[J]. 電子與信息學報, 2014, 36(7): 1585-1591. PAN Zhouhao, LI Daojing, LIU Bo, et al. Processing of the airborne InSAR data based on the BP algorithm and the time-varying baseline[J] Journal of Electronics Information Technology, 2014, 36(7): 1585-1591. SUYKENS J A K and VANDEWALLE J. Least squares support vector machine classifiers[J]. Neural Processing Letters, 1999, 9(3): 293-300. 胡文軍, 王士同, 王娟, 等. 非線性分類的分割超平面快速集成方法[J]. 電子與信息學報, 2012, 34(3): 535-542. HU Wenjun, WANG Shitong, WANG Juan, et al. Fast ensemble of separating hyperplanes for nonlinear classification[J]. Journal of Electronics Information Technology, 2012, 34(3): 535-542. GAO X, LU T, LIU P, et al. A soil moisture classification model based on SVM used in agricultural WSN[C]. 2014 IEEE 7th Joint International, Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, 2014: 432-436. RIES C X, RICHTER F, ROMBERG S, et al. Automatic object annotation from weakly labeled data with latent structured SVM[C]. 2014 12th IEEE International Workshop on Content-based Multimedia Indexing (CBMI), Klagenfurt, Austria, 2014: 1-4. PLATT J. Fast training of support vector machines using sequential minimal optimization[J]. Advances in Kernel Methods: Support Vector Learning, 1999(3): 185-208. JOACHIMS T. Making large scale SVM learning practical[R]. Universit?t Dortmund, 1999. CHANG C C and LIN C J. LIBSVM: A library for support vector machines[J]. ACM Transactions on Intelligent Systems Technology, 2011, 2(3): 389-396. MANGASARIAN O L and MUSICANT D R. Lagrangian support vector machines[J]. The Journal of Machine Learning Research, 2001, 1(3): 161-177. SEOK K. Semi-supervised regression based on support vector machine[J]. Computer Engineering Applications, 2014, 25(2): 447-454. LENG Y, XU X, and QI G. Combining active learning and semi-supervised learning to construct SVM classifier[J]. Knowledge-Based Systems, 2013, 44(1): 121-131. CHEN W J, SHAO Y H, XU D K, et al. Manifold proximal support vector machine for semi-supervised classification[J]. Applied Intelligence, 2014, 40(4): 623-638. 李紅蓮, 王春花, 袁保宗. 一種改進的支持向量機NN- SVM[J]. 計算機學報, 2003, 26(8): 1015-1020. LI Honglian, WANG Chunhua, and YUAN Baozong. An improved SVM: NN-SVM[J]. Chinese Journal of Computers, 2003, 26(8): 1015-1020. 陳寶林. 最優(yōu)化理論與算法[M]. 北京: 清華大學出版社, 2005: 281-322. CHEN Baolin. Optimization Theory and Algorithm[M]. Beijing, Tsinghua University Press, 2005: 281-322. YOSHIYAMA K and SAKURAI A. Laplacian minimax probability machine[J]. Pattern Recognition Letters, 2014, 37: 192-200. MIGLIORATI G. Adaptive polynomial approximation by means of random discrete least squares[J]. Lecture Notes in Computational Science Engineering, 2013, 103: 547-554. HUANG K, YANG H, KING I, et al. The minimum error minimax probability machine[J]. The Journal of Machine Learning Research, 2004(5): 1253-1286. PLAN Y and VERSHYNIN R. Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach[J]. IEEE Transactions on Information Theory, 2013, 59(1): 482-494. ALIZADEN F. Interior point methods in semidefinite programming with applications to combinatorial optimization[J]. SIAM Journal on Optimization, 1995, 5(1): 13-51. BOYD S and VANDENBERGHE L. Convex Optimi- zation[M]. Cambridge University Press, 2009: 127- 214. 邊肇祺, 張學工. 模式識別[M]. 北京: 清華大學出版社, 2001: 83-90. BIAN Zhaoqi and ZHANG Xuegong. Pattern Recognition[M]. Beijing, Tsinghua University Press, 2001: 83-90. -
計量
- 文章訪問數(shù): 1409
- HTML全文瀏覽量: 230
- PDF下載量: 796
- 被引次數(shù): 0