|本期目录/Table of Contents|

[1]刘黎志,彭 贝.Spark集群中还贷问题的逻辑回归模型研究[J].武汉工程大学学报,2020,42(01):113-118.[doi:10.19843/j.cnki.CN42-1779/TQ.201907020]
 LIU Lizhi,PENG Bei.Logistic Regression Model for Loan Repayment in Spark Cluster[J].Journal of Wuhan Institute of Technology,2020,42(01):113-118.[doi:10.19843/j.cnki.CN42-1779/TQ.201907020]
点击复制

Spark集群中还贷问题的逻辑回归模型研究(/HTML)
分享到:

《武汉工程大学学报》[ISSN:1674-2869/CN:42-1779/TQ]

卷:
42
期数:
2020年01期
页码:
113-118
栏目:
机电与信息工程
出版日期:
2021-01-25

文章信息/Info

Title:
Logistic Regression Model for Loan Repayment in Spark Cluster
文章编号:
1674 - 2869(2020)01 - 0113 - 06
作者:
刘黎志12 彭 贝12
1. 智能机器人湖北省重点实验室(武汉工程大学),湖北 武汉 430205;2. 武汉工程大学计算机科学与工程学院,湖北 武汉 430205
Author(s):
LIU Lizhi 12 PENG Bei12
1. Hubei Key Laboratory of Intelligent Robot(Wuhan Institute of Technology), Wuhan 430205, China;2. School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China
关键词:
Spark集群逻辑回归R语言大数据
Keywords:
spark cluster logistic regression R language big data
分类号:
TP311
DOI:
10.19843/j.cnki.CN42-1779/TQ.201907020
文献标志码:
A
摘要:
在Spark集群环境下,通过Sparklyr调用MLib中的并行逻辑回归算法对大规模抵押贷款数据的训练集进行监督学习,使用R语言建立一个预测客户是否会如期归还贷款的逻辑回归分类模型。为了研究该模型的可信性以及获取效率,本文补充了伪判定系数、分类评价指标、测试集性能几个指标对模型可信性进行评价,并判定了模型的可信性。在获得了可信模型的基础上,实验结果表明:当数据量增加到一定阈值后,在集群环境下使用并行的算法获得逻辑回归分类模型的速度要优于对应的串行算法。
Abstract:
A logistic regression model was built to predict whether customers repay loans on time by using the parallel logistic regression algorithm of MLib and R Language. The model was trained based on a large-scaled mortgage dataset in the Spark cluster. To study the dependability and efficiency of this model, the pseudo determination coefficient, evaluation index of classification and performance of test set were considered. Experiments were conducted based on a reliable model and results show that the proposed parallel algorithm outperforms corresponding serial algorithm when the amount of mortgage data exceeds a threshold.

参考文献/References:

[1] 王珊,王会举,覃雄派,等. 架构大数据:挑战、现状与展望[J]. 计算机学报,2011,34(10):1741-1752. [2] 孟小峰, 慈祥. 大数据管理:概念、技术与挑战[J]. 计算机研究与发展, 2013, 50(1): 146-169. [3] 米允龙, 米春桥, 刘文奇. 海量数据挖掘过程相关技术研究进展[J]. 计算机科学与探索, 2015, 9(6):641-659. [4] 王芮,韩锐,贾玉祥. 基于Spark的分布式大数据机器学习算法[J]. 计算机与现代化,2018(11):119-126. [5] LANDSET S, KHOSHGOFTAAR T M, RICHTER A N, et al. A survey of open source tools for machine learning with big data in the Hadoop ecosystem[J]. Journal of Big Data, 2015, 2(1):24. [6] 曾旻. 大数据分析挖掘技术及其决策应用[J]. 信息系统工程,2019(3):79. [7] SHAN S. Big data classification: problems and challenges in network intrusion prediction with machine learning[J]. Acm Sigmetrics Performance Evaluation Review, 2014, 41(4):70-73. [8] 毛国君, 胡殿军, 谢松燕. 基于分布式数据流的大数据分类模型和算法[J]. 计算机学报, 2017(1):161-175. [9] BOOTKRAJANG J, KAB?N A . Learning kernel logistic regression in the presence of class label noise [J]. Pattern Recognition, 2014,47(11):3641-3655.[10] 帅仁俊, 沈阳, 陈平,等. 基于logistic回归模型的Hadoop本地任务调度优化算法[J]. 计算机应用研究, 2017(3):727-729. [11] GUO H P,WEI T. Logistic regression for imbalanced learning based on clustering[J]. International Journal of Computational Science and Engineering,2019,18(1):54-64. [12] MAHER M, MOHAMMAD S. Weighted logistic regression for large-scale imbalanced and rare events data[J]. Knowledge-Based Systems, 2014,59:142-148.[13] YANG L, JUN L, CHENG F, et al. Spark-based large- scale matrix inversion for big data processing[J]. IEEE Access, 2016,4:2166-2176.[14] GARCIA S, BENITEZ J M,RAMIREZ-GALLEGO S, et al. A distributed evolutionary multivariate discretizer for big data processing on apache spark[J]. Swarm and Evolutionary Computation,2018,38:240-250. [15] 周芸韬. 基于R语言的大数据处理平台的设计与实现[J]. 现代电子技术,2017,40(2):53-56. [16] 杨姗姗,王松会,宋东东. 基于回归分析的研究及R语言实现[J]. 电子科技,2015,28(10):186-188.

相似文献/References:

[1]李姚舜,刘黎志*.逻辑回归中的批量梯度下降算法并行化研究[J].武汉工程大学学报,2019,(05):499.[doi:10. 3969/j. issn. 1674-2869. 2019. 05. 017]
 LI Yaoshun,LIU Lizhi*.Parallel Research on Batch Gradient Descent Algorithm in Logistic Regression[J].Journal of Wuhan Institute of Technology,2019,(01):499.[doi:10. 3969/j. issn. 1674-2869. 2019. 05. 017]

备注/Memo

备注/Memo:
收稿日期:2019-07-24作者简介:刘黎志,硕士,副教授。E-mail:[email protected]引文格式:刘黎志,彭贝. Spark集群中还贷问题的逻辑回归模型研究[J]. 武汉工程大学学报,2020,42(1):113-118.
更新日期/Last Update: 2020-06-09