[1]李建中,刘显敏.大数据的一个重要方面:数据可用性[J] .计算机研究与发展, 2013,50(6) :11471162.LI Jianzhong,LIU Xianmin. An important aspect of big data:data usability[J]. Journal of Computer Research and Development, 2013,50(6) :11471162.(in Chinese)[2]李星毅,包从剑,施化吉.数据仓库中的相似重复记录检测方法[J]. 电子科技大学学报,2007,36(6):12731277.LI Xingyi, BAO Congjian, SHI Huaji. A method for detecting approximately duplicate database records in data warehouse[J]. Journal of University of Electronic Science and Technology of China, 2007,36(6):12731277. (in Chinese)[3]庞雄文,姚占林,李拥军.大数据量的高效重复记录检测方法[J].华中科技大学学报,2010,38(2):811.PANG Xiongwen, YAO Zhanlin, LI Yongjun. Efficient duplicate records detection method for massive data[J]. Journal of Huazhong University of Science and Technology,2010,38(2):811. (in Chinese)[4]周典瑞,周莲英.海量数据的相似记录检测算法[J].计算机应用,2013,33(8):22082211. ZHOU Dianrui,ZHOU Lianying. Algorithm for detecting approximate duplicate records in massive data[J]. Journal of Computer Application,2013,33(8):22082211. (in Chinese)[5]敖莉,舒继武,李明强.重复数据删除技术[J].软件学报,2010, 21(5):916929.AO Li, SHU Jiwu, LI Mingqiang. Data deduplication techniques[J]. Journal of Software, 2010,21(5):916929. (in Chinese)[6]韩京宇,徐立臻,董逸生.一种大数据量的相似记录检测方法[J].计算机研究与发展, 2005,42(12) :22062212. HAN Jingyu,XU Lizhen,DONG Yisheng. An approach for detecting similar duplicate records of massive data[J]. Journal of Computer Research and Development, 2005,42(12):22062212. (in Chinese)[7]邱越峰.一种高效的检测相似重复记录的方法[J].计算机学报,2001,24(1):6977.QIU Yuefen. An efficient approach for detecting approximately duplicate database records[J]. CHINESE J.COMPUTERS, 2001,24(1):6977. (in Chinese)[8]DEAN J,GHEMAWAT S. MapReduce: simplified data processing on large clusters[C]// In Proceedings of the 6th Symposium on Operating Systems Design and Implementation, New York:NY,2004.
[1]郭文龙,董建怀.基于模糊综合评判和长度过滤的SNM改进算法[J].武汉工程大学学报,2017,39(04):403.[doi:10. 3969/j. issn. 1674?2869. 2017. 04. 015]
GUO Wenlong,DONG Jianhuai.Improved SNM Algorithm Based on Fuzzy Comprehensive Evaluation and Length Filtering[J].Journal of Wuhan Institute of Technology,2017,39(09):403.[doi:10. 3969/j. issn. 1674?2869. 2017. 04. 015]
[2]刘黎志,何经纬.空气质量监测大数据区间的统计问题[J].武汉工程大学学报,2019,(02):179.[doi:10. 3969/j. issn. 1674?2869. 2019. 02. 015]
LIU Lizhi,HE Jingwei.Big Data Interval Statistics for Air Quality Monitoring[J].Journal of Wuhan Institute of Technology,2019,(09):179.[doi:10. 3969/j. issn. 1674?2869. 2019. 02. 015]
[3]刘黎志,彭 贝.Spark集群中还贷问题的逻辑回归模型研究[J].武汉工程大学学报,2020,42(01):113.[doi:10.19843/j.cnki.CN42-1779/TQ.201907020]
LIU Lizhi,PENG Bei.Logistic Regression Model for Loan Repayment in Spark Cluster[J].Journal of Wuhan Institute of Technology,2020,42(09):113.[doi:10.19843/j.cnki.CN42-1779/TQ.201907020]
[4]彭 贝,刘黎志*,杨 敏,等.基于Hive的空气质量大数据查询优化方法[J].武汉工程大学学报,2020,42(04):467.[doi:10.19843/j.cnki.CN42-1779/TQ.202003009]
PENG Bei,LIU Lizhi*,YANG Min,et al.Hive-Based Query Optimization for Air Quality Big Data[J].Journal of Wuhan Institute of Technology,2020,42(09):467.[doi:10.19843/j.cnki.CN42-1779/TQ.202003009]