## An optimization problem has to be solved by adjusting the threshold and seeking the optimum in order to balance the trade-off between the decrease in revenue and a decrease in cost.

Then by using the layout of the confusion matrix plotted in Figure 6, the four regions are divided as True Positive (TN), False Positive (FP), False Negative (FN) and True Negative (TN) ifвЂњSettledвЂќ is defined as positive and вЂњPast DueвЂќ is defined as negative,. Aligned with all the confusion matrices plotted in Figure 5, TP may be the good loans hit, and FP may be the defaults missed. We have been keen on those two areas. To normalize the values, two commonly used mathematical terms are defined: real good Rate (TPR) and False Positive Rate (FPR). Whitewater WI payday loans Their equations are shown below:

## In this application, TPR could be the hit price of good loans, plus it represents the capacity of earning cash from loan interest; FPR is the missing rate of standard, and it also represents the likelihood of losing profits.

Receiver Operational Characteristic (ROC) bend is one of widely used plot to visualize the performance of a category model after all thresholds. In Figure 7 left, the ROC Curve associated with the Random Forest model is plotted. This plot basically shows the partnership between TPR and FPR, where one always goes into the exact same way as one other, from 0 to 1. a great category model would will have the ROC curve over the red standard, sitting because of the вЂњrandom classifierвЂќ. The region Under Curve (AUC) can also be a metric for assessing the category model besides precision. The AUC regarding the Random Forest model is 0.82 away from 1, which can be decent.

Although the ROC Curve obviously shows the connection between TPR and FPR, the limit is an implicit adjustable. The optimization task cannot purely be done by the ROC Curve. Consequently, another measurement is introduced to add the limit adjustable, as plotted in Figure 7 right. Since the orange TPR represents the ability of getting cash and FPR represents the possibility of losing, the intuition is to look for the limit that expands the gap between curves whenever you can. The sweet spot is around 0.7 in this case.

You will find limits to the approach: the FPR and TPR are ratios. Also though they truly are great at visualizing the effect associated with the category limit on making the forecast, we nevertheless cannot infer the actual values of this revenue that various thresholds result in. The FPR, TPR vs Threshold approach makes the assumption that the loans are equal (loan amount, interest due, etc.), but they are actually not on the other hand. Those who default on loans could have an increased loan quantity and interest that have to be reimbursed, also it adds uncertainties towards the results that are modeling.

## Luckily for us, step-by-step loan amount and interest due are offered by the dataset it self.

The thing staying is to get a method to link these with the limit and model predictions. It is really not hard to determine a manifestation for revenue. These two terms can be calculated using 5 known variables as shown below in Table 2 by assuming the revenue is solely from the interest collected from the settled loans and the cost is solely from the total loan amount that customers default

## Trả lời