基于概念基元的詞語相似度計(jì)算研究
doi: 10.11999/JEIT160176 cstr: 32379.14.JEIT160176
基金項(xiàng)目:
國(guó)家863計(jì)劃十二五項(xiàng)目(2012AA011102),國(guó)家語委十二五科研項(xiàng)目(YB125-53)
Word Similarity Measurement Based on Concept Primitive
Funds:
The Twelfth Five-Year Project of National 863 Program of China (2012AA011102), The State Language Commission Twelfth Five-Year Research Project (YB125-53)
-
摘要: 詞語相似度的計(jì)算在機(jī)器翻譯、信息檢索等多個(gè)領(lǐng)域有重要作用。該文以概念層次網(wǎng)絡(luò)理論的概念基元符號(hào)系統(tǒng)為語義資源,在共性與差異性對(duì)比思想下,提出一個(gè)涵蓋層次性、網(wǎng)絡(luò)性、對(duì)比對(duì)偶特性、掛靠特性及五元組信息的多維度詞語相似度計(jì)算方法;在節(jié)點(diǎn)深度和節(jié)點(diǎn)距離度量上,引入權(quán)重以增加不同層次間的區(qū)分程度。在人工打分的測(cè)試集上進(jìn)行實(shí)驗(yàn),結(jié)果表明該方法計(jì)算的相似度與人工判斷的符合程度較好,兼容度、相關(guān)系數(shù)和序?qū)Ψ隙确謩e達(dá)到0.812, 0.786和0.775;同時(shí),相關(guān)性檢驗(yàn)的結(jié)果也顯示該方法的計(jì)算值與人工打分顯著相關(guān)。
-
關(guān)鍵詞:
- 詞語相似度 /
- 語義距離 /
- 概念層次網(wǎng)絡(luò) /
- 概念基元
Abstract: Word similarity measurement plays an important role in machine learning, information retrieval and many other fields. Regarding the concept primitive symbol system of Hierarchical network of concepts theory as semantic resource and comparing commonness with difference, a multi-dimensional computational method for similarity is proposed which considers the hierarchy, netted nature, comparability and duality, attached feature and quintuple information of the system. Weight strategy is introduced for node depth and distance measurement to increase the discrimination of node level. Experiment on manual scoring test set shows that the computed similarities are consistent with human judgments. The proposed method achieves 0.812, 0.786, and 0.775 in compatibility degree, correlation coefficient, and ordinal pair conformity respectively. Meanwhile, the result of correlation test further proofs that the computed similarities and humans scores are significantly correlated. -
LIN D. An information-theoretic definition of similarity semantic distance in WordNet[C]. Proceedings of the 15th International Conference on Machine Learning, San Francisco, CA, USA, 1998: 296-304. WU Z and PALMER M. Verbs semantics and lexical selection [C]. Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Stroudsburg, PA, USA, 1994: 133-138. doi: 10.3115/981732.981751. RESNIK P. Semantic similarity in a taxonomy: an information based measure and its application to problems of ambiguity in natural language[J]. Journal of Artificial Intelligence Research, 1999, 11(7): 95-130. doi: 10.1613/jair. 514. 王桐, 王磊, 吳吉義, 等. WordNet中的綜合概念語義相似度計(jì)算方法[J]. 北京郵電大學(xué)學(xué)報(bào), 2013, 36(2): 98-101. doi: 10.13190/jbupt.201302.98.wangt. WANG Tong, WANG Lei, WU Jiyi, et al. Semantic similarity calculation method of Comprehensive concept in WordNet[J]. Journal of Beijing University of Posts and Telecommunications, 2013, 36(2): 98-101. doi: 10.13190/ jbupt.201302.98.wangt. WANG Junhua, ZUO Wanli, and PENG Tao. Hyponymy graph model for word semantic similarity measurement[J]. Chinese Journal of Electronics, 2015, 24(1): 96-101. doi: 10.1049/cje.2015.01.016. 劉群, 李素建. 基于《知網(wǎng)》的詞匯語義相似度計(jì)算[C]. 第三屆漢語詞匯語義學(xué)研討會(huì)論文集, 臺(tái)北, 中國(guó), 2002: 59-76. LIU Qun and LI Sujian. Words semantic similarity computation based on HowNet[C]. Proceedings of the 3rd Chinese Lexical Semantics Workshop, Taipei, China, 2002: 59-76. 李國(guó)佳. 基于知網(wǎng)的中文詞語相似度計(jì)算[J]. 智能計(jì)算機(jī)與應(yīng)用, 2015, 5(3): 49-52. doi: 10.3969/j.issn.2095-2163.2015. 03.015. LI Guojia. Chinese words similarity computation based on HowNet[J]. Intelligent Computer and Applications, 2015, 5(3): 49-52. doi: 10.3969/j.issn.2095-2163.2015.03.015. 張滬寅, 劉道波, 溫春艷. 基于《知網(wǎng)》的詞語語義相似度改進(jìn)算法研究[J]. 計(jì)算機(jī)工程, 2015, 41(2): 151-156. doi: 10.3969/j.issn.1000-3428.2015.02.029. ZHANG Huyin, LIU Daobo, and WEN Chunyan. Research on improved algorithm of word semantic similarity based on HowNet[J]. Computer Engineering, 2015, 41(2): 151-156. doi: 10.3969/j.issn.1000-3428.2015.02.029. 孫晶, 張東站. 基于逆概念頻率的詞語相似度計(jì)算[J]. 廈門大學(xué)學(xué)報(bào)(自然科學(xué)版), 2015, 54(2): 257-262. doi: 10.6043/ j.issn.0438-0479.2015.02.018. SUN Jing and ZHANG Dongzhan. Word similarity computing based on inverse concept frequencies[J]. Journal of Xiamen University (Natural Science), 2015, 54(2): 257-262. doi: 10.6043/j.issn.0438-0479.2015.02.018. BROWN P, PIETRA S, PIETRA V, et al. Word sense disambiguation using statistical methods[C]. Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, USA, 1991: 264-270. doi: 10.3115/981344.981378. 關(guān)毅, 王曉龍. 基于統(tǒng)計(jì)的漢語詞匯間語義相似度計(jì)算[C]. 第七屆全國(guó)計(jì)算語言學(xué)聯(lián)合學(xué)術(shù)會(huì)議論文集, 哈爾濱, 中國(guó), 2003: 221-227. GUAN Yi and WANG Xiaolong. A statistical measure of semantic similarity between Chinese words[C]. Proceedings of the 7th Joint Symposium on Computational Linguistics, Harbin, China, 2003: 221-227. 王石, 曹存根, 裴亞軍, 等. 一種基于搭配的中文詞匯語義相似度計(jì)算方法[J]. 中文信息學(xué)報(bào), 2013, 27(1): 7-14. doi: 10.3969/j.issn.1003-0077.2013.01.002. WANG Shi, CAO Cungen, PEI Yajun, et al. A collocation based method for semantic similarity measure for Chinese words[J]. Journal of Chinese Information Processing, 2013, 27(1): 7-14. doi: 10.3969/j.issn.1003-0077.2013.01.002. 李慧. 詞語相似度算法研究綜述[J]. 現(xiàn)代情報(bào), 2015, 35(4):[13] 172-177. doi: 10.3969/j.issn.1008-0821.2015.04.035. LI Hui. A review on the research of word similarity algorithms[J]. Journal of Modern Information, 2015, 35(4): 172-177. doi: 10.3969/j.issn.1008-0821.2015.04.035. 黃曾陽. HNC理論全書(第五冊(cè))[M]. 北京: 科學(xué)出版社, 2015: 1-102. HUANG Zengyang. The Complete Book of Hierarchical Network of Concepts Theory (Book 5)[M]. Beijing: Science Press, 2015: 1-102. 苗傳江. HNC(概念層次網(wǎng)絡(luò))理論導(dǎo)論[M]. 北京: 清華大學(xué)出版社, 2005: 1-49. MIAO Chuanjiang. Introduction to HNC Theory[M]. Beijing: Tsinghua University Press, 2005: 1-49. 吳佐衍, 王宇. 基于HNC理論的詞語相似度計(jì)算[J]. 中文信息學(xué)報(bào), 2014, 28(2): 37-43. doi: 10.3969/j.issn.1003-0077. 2014.02.005. WU Zuoyan and WANG Yu. A new measure of semantic similarity based on hierarchical network of concepts[J]. Journal of Chinese Information Processing, 2014, 28(2): 37-43. doi: 10.3969/j.issn.1003-0077.2014.02.005. 史燕. 基于HNC的漢語句子相似度算法的研究[D]. [碩士論文], 江蘇大學(xué), 2009: 14-19. doi: 10.7666/d.y1604350. SHI Yan. The research on Chinese sentence similarity algorithm based on HNC[D]. [Master dissertation], Jiangsu University, 2009: 14-19. doi: 10.7666/d.y1604350. -
計(jì)量
- 文章訪問數(shù): 1528
- HTML全文瀏覽量: 154
- PDF下載量: 308
- 被引次數(shù): 0