[1] HERNANDEZ M, STOLFO S. The merge/purge problem for large databases[C]//Proceedings of the ACM SIGMOD international conference on management of data. California:San Jose, 1995: 127-138. [2] HERNANDEZ M, STOLFO S. Real-world data is dirty: data cleansing and the merge/purge problem[J]. Data Mining and Knowledge Discovery, 1998,2(1): 9-37. [3] 叶焕倬,吴迪. 相似重复记录清理方法研究综述[J]. 现代图书情报技术,2010,26(9):56-66. YE H Z, WU D. A survey of approximately duplicate data cleaning method[J]. New Technology of Library and Information Service, 2010,26(9):56-66. [4] 陈爽,宋金玉,刁兴春,等. 基于伸缩窗口和等级调整的SNM改进方法[J]. 计算机应用研究,2013,30(9):2736-2739. CHEN S,SONG J Y,DIAO X C, et al. Amelioration method of SNM based on flexible window and ranking adjusting[J]. Application Research of Computers, 2013,30(9):2736-2739. [5] 殷秀叶. 大数据环境下的相似重复记录检测方法[J]. 武汉工程大学学报,2014,36(9):66-69. YIN X Y. Method for detecting approximately duplicate database records in big data environment[J]. Journal of Wuhan Institute of Technology,2014,36(9):66-69. [6] 陈芬. 改进量子粒子群算法优化神经网络的数据库重复记录检测[J]. 计算机应用与软件,2014,31(3):20-21,115. CHEN F. Database duplicate records detection using neural network optimized by iqpso[J]. Computer Applications and Software, 2014,31(3):20-21,115. [7] 李鑫,李军,丰继林,等. 面向相似重复记录检测的特征优选方法[J]. 传感器与微系统,2011,30(2):37-40. LI X, LI J, FENG J L, et al. An optimal feature selection method for approximately duplicate records detecting[J]. Transducer and Microsystem Technologies, 2011,30(2):37-40. [8] 周典瑞,周莲英. 海量数据的相似重复记录检测算法[J]. 计算机应用,2013,33(8):2208-2211. ZHOU D R,ZHOU L Y. Algorithm for detecting approximate duplicate records in massive data[J]. Journal of Computer Applications, 2013,33(8):2208- 2211. [9] 周丽娟,肖满生. 基于数据分组匹配的相似重复记录检测[J]. 计算机工程,2010,36(12):104-106. ZHOU L J,XIAO M S. Detection of approximately duplicated records based on data grouping matching[J]. Computer Engineering, 2010,36(12):104-106. [10] 肖满生,周浩慧,王宏. 基于模糊综合评判的相似重复记录识别方法[J]. 计算机工程,2010,36(13):51-53. XIAO M S,ZHOU H H,WANG H. Identification method of approximately duplicate records based on fuzzy integrated estimation[J]. Computer Engineering,2010,36(13):51-53. [11] 郭文龙. 基于长度过滤和有效权值的SNM改进算法[J]. 计算机工程与应用,2014,50(19):123-127. GUO W L. Improved SNM algorithm based on length filtering and effective weights[J]. Computer Engineering and Applications,2014,50(19):123- 127. [12] 刘雅思,程力,李晓. 基于长度过滤和动态容错的SNM改进算法[J]. 计算机应用研究,2017,34(1):147-150. LIU Y S, CHENG L, LI X. Improved SNM algorithm based on length filtering and dynamic fault-tolerance[J]. Application Research of Computers, 2017,34(1):147-150. [13] 刘河香. 模糊数学理论及其应用[M]. 北京:科学出版社,2012. [14] 张胜礼,李永明. 广义模糊集GFScom在模糊综合评判中的应用[J]. 计算机科学,2015,42(7):125-128,161. ZHANG S L,LI Y M. Application of generalized fuzzy sets GFScom to fuzzy comprehensive evaluation[J]. Computer Science, 2015,42(7):125-128,161. [15] 余肖生,胡孙枝. 基于SNM改进算法的相似重复记录消除[J]. 重庆理工大学学报(自然科学版),2016,30(4):91-96. YU X S, HU S Z. Research on eliminating duplicate records based on SNM improved algorithm[J]. Journal of Chongqing University of Technology(Natural Science), 2016,30(4):91-96.
[1]殷秀叶.大数据环境下的相似重复记录检测方法[J].武汉工程大学学报,2014,(09):66.[doi:103969/jissn16742869201409013]
YIN Xiu ye.Method for detecting approximately duplicate database records in big data environment[J].Journal of Wuhan Institute of Technology,2014,(04):66.[doi:103969/jissn16742869201409013]