《電子技術(shù)應(yīng)用》
您所在的位置:首頁(yè) > 其他 > 設(shè)計(jì)應(yīng)用 > 基于詞匯增強(qiáng)和表格填充的中文命名實(shí)體識(shí)別
基于詞匯增強(qiáng)和表格填充的中文命名實(shí)體識(shí)別
電子技術(shù)應(yīng)用
褚天舒1,唐球1,梁軍學(xué)2,,徐睿1,,王明陽(yáng)2,劉濤2
1.華北計(jì)算機(jī)系統(tǒng)工程研究所,,北京 100083;2.中國(guó)人民解放軍93216部隊(duì),北京 100085
摘要: 中文命名實(shí)體識(shí)別主要包括中文平面命名實(shí)體識(shí)別和中文嵌套命名實(shí)體識(shí)別兩個(gè)任務(wù),,其中中文嵌套命名實(shí)體識(shí)別任務(wù)難度更大。提出了一個(gè)基于詞匯增強(qiáng)和表格填充的統(tǒng)一模型TLEXNER,,該模型能夠同時(shí)處理上述任務(wù),。該模型首先針對(duì)中文語(yǔ)料分詞困難的問(wèn)題,使用詞典適配器將詞匯信息融合到BERT預(yù)訓(xùn)練模型,,并且將字符與詞匯組的相對(duì)位置信息集成到BERT的嵌入層中,;然后通過(guò)條件層歸一化和雙仿射模型構(gòu)造并預(yù)測(cè)字符對(duì)表格,使用表格建模字符與字符之間的關(guān)系,,得到平面實(shí)體與嵌套實(shí)體的統(tǒng)一表示,;最后根據(jù)字符對(duì)表格上三角區(qū)域的數(shù)值判斷實(shí)體類別。提出的模型在平面實(shí)體的公開數(shù)據(jù)集Resume和自行標(biāo)注的軍事領(lǐng)域嵌套實(shí)體數(shù)據(jù)集上F1分別是97.35%和91.96%,,證明了TLEXNER模型的有效性,。
中圖分類號(hào):TP391 文獻(xiàn)標(biāo)志碼:A DOI: 10.16157/j.issn.0258-7998.233939
中文引用格式: 褚天舒,唐球,,梁軍學(xué),,等. 基于詞匯增強(qiáng)和表格填充的中文命名實(shí)體識(shí)別[J]. 電子技術(shù)應(yīng)用,2024,,50(2):23-29.
英文引用格式: Chu Tianshu,,Tang Qiu,Liang Junxue,,et al. Chinese named entity recognition based on lexicon enhancement and table filling[J]. Application of Electronic Technique,,2024,50(2):23-29.
Chinese named entity recognition based on lexicon enhancement and table filling
Chu Tianshu1,,Tang Qiu1,,Liang Junxue2,Xu Rui1,,Wang Mingyang2,,Liu Tao2
1.National Computer System Engineering Research Institute of China, Beijing 100083,, China,; 2.People′s Liberation Army 93216,, Beijing 100085, China
Abstract: Chinese named entity recognition has been involved with two tasks, including Chinese flat named entity recognition and Chinese nested named entity recognition. Chinese nested named entity recognition is more difficult. Therefore, this paper proposes a unified model, namely TLEXNER, based on lexicon enhancement and table filling, which can tackle the above two tasks concurrently. Aiming at the difficulty of Chinese word segmentation, the lexicon adapter is used to integrate the lexicon information into the BERT pre-training model,,and integrates the relative position information of characters and lexical groups into the BERT embedding layer. Then conditional layer normalization and biaffine model is used to build and predict the representation of the character-pair table, and the relationship between character pairs is modeled by table structure to obtain the unified representation of flat entities and nested entities.
Key words : lexicon enhancement,;Chinese named entity recognition;table filling

引言

在大數(shù)據(jù)時(shí)代,,每天都產(chǎn)生海量的文本數(shù)據(jù),,如何從這些存在大量冗余的數(shù)據(jù)中獲取真正有價(jià)值的知識(shí)信息顯得愈發(fā)重要。使用知識(shí)抽取方法能夠自動(dòng)識(shí)別并提取所需知識(shí)要素信息,,為后續(xù)的知識(shí)融合,、知識(shí)加工、知識(shí)應(yīng)用提供數(shù)據(jù)支撐,,其中命名實(shí)體識(shí)別是知識(shí)抽取的重要任務(wù),,也是知識(shí)圖譜、數(shù)據(jù)挖掘,、智能檢索,、問(wèn)答系統(tǒng)等下游任務(wù)的基礎(chǔ),命名實(shí)體識(shí)別技術(shù)的研究具有重要的理論需求與現(xiàn)實(shí)意義,。

中文命名實(shí)體識(shí)別根據(jù)粒度劃分可分為基于詞的命名實(shí)體識(shí)別,、基于字符的命名實(shí)體識(shí)別和基于字詞混合的命名實(shí)體識(shí)別。與英文命名實(shí)體識(shí)別相比,,中文沒有明確的單詞分隔符號(hào),,因此,中文命名實(shí)體識(shí)別存在分詞困難的問(wèn)題,。


本文詳細(xì)內(nèi)容請(qǐng)下載:

http://forexkbc.com/resource/share/2000005850


作者信息:

褚天舒1,,唐球1,,梁軍學(xué)2,,徐睿1,王明陽(yáng)2,,劉濤2

1.華北計(jì)算機(jī)系統(tǒng)工程研究所,,北京 100083;2.中國(guó)人民解放軍93216部隊(duì),,北京 100085


weidian.jpg

此內(nèi)容為AET網(wǎng)站原創(chuàng),,未經(jīng)授權(quán)禁止轉(zhuǎn)載。