中圖分類號(hào): TP311 文獻(xiàn)標(biāo)識(shí)碼: A DOI:10.16157/j.issn.0258-7998.211437 中文引用格式: 楊政,尹春林,,蔡迪,,等. 一種基于成詞率和譜聚類的電力文本領(lǐng)域詞發(fā)現(xiàn)方法[J].電子技術(shù)應(yīng)用,2021,,47(10):29-32,,37. 英文引用格式: Yang Zheng,Yin Chunlin,,Cai Di,,et al. A power text domain word discovery method based on word formation rate and spectral clustering[J]. Application of Electronic Technique,2021,,47(10):29-32,,37.
A power text domain word discovery method based on word formation rate and spectral clustering
Yang Zheng1,Yin Chunlin1,,Cai Di2,,Li Huibin2
1.Electric Power Research Institute of Yunnan Power Grid Co.,,Ltd.,,Kunming 650217,China; 2.School of Mathematics and Statistics,,Xi′an Jiaotong University,,Xi′an 710049,China
Abstract: Considering that the current power industry still lacks effective domain word discovery methods, this paper takes the power industry science and technology project text as the original corpus, combines the statistical features based on the mutual information, left entropy as well as right entropy with the features of traditional language word-formation rules, and proposes the new concept of power text word formation rate. The proposed method firstly uses the word formation rate to get the initial candidate word set by unsupervised filtering, and then performs the text slicing algorithm and common word filtering operation on the candidate word set, and finally performs the word embedding and spectral clustering algorithms to get the final power text-domain words. Experimental results show that the method proposed in this paper is accurate and effective, and provides a new method for power text domain word discovery.
Key words : word formation rate,;spectral clustering,;domain word discovery;power text