[Abstract]:Credit risk is a key area of banking, is a common concern of various stakeholders, such as institutions, consumers and regulators. The research of credit risk is a hot research topic in the field of finance. In recent years it has also attracted the attention of statisticians. Wikipedia2017). The credit risk is defined as the loss risk or other credit line caused by the debtor's failure to pay the loan. The core of the credit risk is the default event. When the debtor can not pay the related debt according to the debt contract, and fulfill the legal obligations, there is a default event. In the study of the credit risk of bank customers. It is not accurate to judge the credit quality of customers simply by whether they default or not, because most customers do not default during the study period, and we can not observe the survival time of most individuals. In recent years, some studies have applied the method of survival analysis to credit risk analysis model. Survival analysis is a dynamic analysis method. It not only can predict the probability of the event, but also can predict the time of the event. It is good at dealing with censored data and censored data. The estimated survival probability can reflect the relationship between risk and characteristic factors more intuitively. At the same time, time variables are introduced into the model. This paper based on 60508 sample bank customers during the research period 420 high-dimensional characteristic variables of micro-credit desensitization data. When the traditional method of variable selection is challenged, the regularization methods of today's hot spots are first compared and the algorithms are tried. We creatively take the span of default into account of the credit analysis model, introduce the number of customer first default period, and process the data into a fixed format of survival data. Cox multiplicative hazard rate model based on LASSO-MCP regularization method and additive hazard rate model based on LASSO-SCAD regularization method are established respectively. We take the product of coefficient estimate of important variable and the value of corresponding characteristic variable as credit score and establish classification rules. Comprehensive evaluation of the credit risk of each customer. By comparing with the results of bank experience, the economic significance of some important characteristic variables based on survival model is given. Finally. We compare the two models of survival analysis in terms of the results of important feature variables and the prediction effect of the model. It is found that the proportional risk model based on LASSO-MCP regularization method uses fewer features. In the end, this paper validates and compares the credit risk analysis model based on different methods from several angles. Based on the empirical data, the traditional two-classification Logistic regression model and the modern decision tree model are implemented respectively. The multiplication model and addition model of survival analysis in the previous chapters are compared with the two models, based on theoretical analysis and model results. The four models are compared from two aspects: the ROC curve which explains the accuracy of the model and the KS statistics which represent the distinguishing ability of the model. It is found that the survival analysis Cox model is superior to the other three models, which verifies the good empirical effect of the survival analysis model introduced in this paper based on the regularization method. There are two aspects: accuracy and differentiability. It is concluded that for three-year microfinance data, the Cox proportional risk model has the highest accuracy and maximum distinguishing power between the base and the LASSO-MCP regularization method.


