《電子技術(shù)應(yīng)用》
您所在的位置:首頁(yè) > 其他 > 設(shè)計(jì)應(yīng)用 > 面向缺失數(shù)據(jù)的布魯姆近似成員查詢算法
面向缺失數(shù)據(jù)的布魯姆近似成員查詢算法
2022年電子技術(shù)應(yīng)用第3期
吳佳雯1,王宇科2,裴書(shū)玉1,,謝 鯤1,劉楚達(dá)3
1.湖南大學(xué) 信息科學(xué)與工程學(xué)院,,湖南 長(zhǎng)沙410082; 2.湖南大學(xué) 校園信息化建設(shè)與管理辦公室,,湖南 長(zhǎng)沙410082; 3.長(zhǎng)沙航空職業(yè)技術(shù)學(xué)院,湖南 長(zhǎng)沙410082
摘要: 隨著網(wǎng)絡(luò)的發(fā)展,,越來(lái)越多的場(chǎng)景需要在不完整數(shù)據(jù)下進(jìn)行近似成員查詢,,傳統(tǒng)成員查詢的布魯姆過(guò)濾器不能滿足上述要求。提出面向缺失數(shù)據(jù)的布魯姆近似查詢算法,,先對(duì)高維不完整數(shù)據(jù)的缺失部分進(jìn)行預(yù)填充,,通過(guò)PCA算法,將高維數(shù)據(jù)轉(zhuǎn)換到低維數(shù)據(jù),,使用局部敏感哈希函數(shù)與標(biāo)準(zhǔn)哈希函數(shù)結(jié)合的方式將低維數(shù)據(jù)存儲(chǔ)到布魯姆過(guò)濾器中,。使用兩個(gè)真實(shí)數(shù)據(jù)集驗(yàn)證了所提算法的功能,所提面向缺失數(shù)據(jù)的布魯姆近似查詢算法,,能有效地解決存在缺失數(shù)據(jù)的近似成員查詢問(wèn)題,。
中圖分類號(hào): TP393.0
文獻(xiàn)標(biāo)識(shí)碼: A
DOI:10.16157/j.issn.0258-7998.212468
中文引用格式: 吳佳雯,王宇科,,裴書(shū)玉,,等. 面向缺失數(shù)據(jù)的布魯姆近似成員查詢算法[J].電子技術(shù)應(yīng)用,2022,,48(3):78-82,,87.
英文引用格式: Wu Jiawen,Wang Yuke,,Pei Shuyu,,et al. Approximate membership query algorithm for incomplete data based on Bloom filter[J]. Application of Electronic Technique,2022,,48(3):78-82,,87.
Approximate membership query algorithm for incomplete data based on Bloom filter
Wu Jiawen1,Wang Yuke2,,Pei Shuyu1,,Xie Kun1,Liu Chuda3
1.College of Computer Science and Electronic Engineering,,Hunan University,,Changsha 410082,China,; 2.Office of Information,,Hunan University,Changsha 410082,,China,; 3.Changsha Aeronautical Vocational and Technical College,Changsha 410082,,China
Abstract: More and more scenarios require approximate membership queries for incomplete query data, but traditional Bloom filters for membership queries cannot meet these requirements. An approximate membership query algorithm for incomplete data based on Bloom filter is proposed. It first preprocesses the missing parts of the high-dimensional incomplete data, then converts the high-dimensional data to the low-dimensional data based on PCA technique, and the low-dimensional data is stored in a Bloom filter by combining local sensitive hash functions with standard hash functions. Extensive experiments are conducted using two publicly real-world network performance datasets, and it shows that the proposed algorithm efficiently solves the approximate membership query problem for data with incomplete data. It is also necessary to enrich the means of filling in the missing parts in the data pre-processing. The proposed solution can effectively solve the approximate membership query problem for data with missingness.
Key words : Bloom filter,;approximate membership query,;query algorithm

0 引言

    標(biāo)準(zhǔn)的布魯姆過(guò)濾器(Bloom Filter,BF)[1]是一個(gè)空間效率很高的數(shù)據(jù)結(jié)構(gòu),,它可以表示集合并支持集合的成員查詢,,快速判斷查詢?cè)厥欠裨诩现小.?dāng)給定一個(gè)查詢?cè)豦時(shí),,它被用來(lái)回答查詢?cè)厥欠裨谶@個(gè)集合,。一個(gè)標(biāo)準(zhǔn)的布魯姆構(gòu)造一個(gè)長(zhǎng)度為m的比特位數(shù)組,初始化為0,。在插入階段,,它使用k個(gè)獨(dú)立的哈希函數(shù)h1(·),…,,hk(·)來(lái)計(jì)算插入元素在數(shù)組中對(duì)應(yīng)的k個(gè)哈希位置h1(e)%m,,…,hk(e)%m,,并將這k個(gè)哈希位置置位為“1”,。在查詢階段,通過(guò)檢查是否所有的k個(gè)哈希位置都置位為“1”,,來(lái)判斷元素是否在集合中,。如果它們都置位為“1”,則認(rèn)為查詢?cè)豦在集合S中,;否則,,則認(rèn)為查詢?cè)豦不在集合S中。

    現(xiàn)有標(biāo)準(zhǔn)布魯姆過(guò)濾器通常用于常規(guī)的精確匹配的成員集合查詢(Exact-matching Membership Query,,EMQ),,即:檢查查詢數(shù)據(jù)本身是否存儲(chǔ)在布魯姆過(guò)濾器,它是否是集合的一個(gè)成員,。布魯姆過(guò)濾器作為一種空間精簡(jiǎn),、查詢高效的支持成員集合查詢結(jié)構(gòu),一直被廣泛用于各種實(shí)際應(yīng)用中[2-3],。在網(wǎng)絡(luò)領(lǐng)域應(yīng)用中,,布魯姆過(guò)濾器可以用來(lái)存儲(chǔ)防火墻海量的黑名單數(shù)據(jù)[4],以及在網(wǎng)站中進(jìn)行內(nèi)容去重等[5],。在大數(shù)據(jù)應(yīng)用中,,例如HBase中使用布魯姆過(guò)濾器來(lái)減少代價(jià)高昂的I/O次數(shù),提升數(shù)據(jù)庫(kù)查詢效率[6],。




本文詳細(xì)內(nèi)容請(qǐng)下載:http://forexkbc.com/resource/share/2000004008,。




作者信息:

吳佳雯1,王宇科2,,裴書(shū)玉1,,謝  鯤1,,劉楚達(dá)3

(1.湖南大學(xué) 信息科學(xué)與工程學(xué)院,湖南 長(zhǎng)沙410082,;

2.湖南大學(xué) 校園信息化建設(shè)與管理辦公室,,湖南 長(zhǎng)沙410082;

3.長(zhǎng)沙航空職業(yè)技術(shù)學(xué)院,,湖南 長(zhǎng)沙410082)




wd.jpg

此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載,。