欢迎光临散文网 会员登陆 & 注册

银行案例学习实例5_Reject Inference 拒绝引用

2020-07-22 09:02 作者:python风控模型  | 我要投稿

python金融风控评分卡模型和数据分析微专业课:http://dwz.date/b9vv


up主金融微专业课


Reject Inference is a topic that separates credit scoring from the other classification problems such as marketing propensity models for cross / upselling, etc. As you will discover later, reject inference is about patching information gaps that exist during the development of application scorecards. Let us try to gain a more holistic perspective about patching information gaps through the way human beings have evolved.

拒绝推理是一个将信用评分与其他分类问题(如交叉/向上销售的营销倾向模型等)分开的主题。正如您稍后将发现的,拒绝推断是关于修补应用程序记分卡开发过程中存在的信息差距。 让我们尝试通过人类进化的方式获得更全面的视角来修补信息空白。

Connecting the Dots

Recently I watched a Hindi movie called ‘Ankhon Dekhi’, the movie title translates to ‘seen with your eyes’. In the beginning, the central character of this movie, after a dramatic event in his life, decides to believe only what he sees with his eyes. What follows are his adventures / misadventures while doing so. Although the theme of this movie has a high potential, I think it became a bit pretentious in its presentation especially towards the end. The idea of believing your eyes seems appropriate but it has its own shortcomings. Evolution has trained our brain to superseded our vision to make split second decisions. Numerous optical illusions are proofs of this phenomenon. We will explore some optical illusions / illustrations that will highlight how our brain and eyes work in this article. But before that, let us consider an example of split second decision making as a necessity for survival.

Imagine a human ancestor in the middle of the dark. Our ancestor is hungry, he hasn’t  eaten in days. He sees a silhouette of a creature lurking in front of him.  This creature could be his next meal. On the other hand, this could be a predatory creature and our ancestor will become a delicious meal for this creature. The reason why humans are still around on this planet is because our ancestors’ eyes and brain have created some simple rules to deal with this situation. One of the instruments evolution has equipped humans with is ..

连接点
最近我看了一部名为“Ankhon Dekhi”的印地语电影,电影名称翻译为“用你的眼睛看”。一开始,这部电影的中心人物,在他生命中的戏剧性事件之后,决定只相信他用眼睛看到的东西。接下来是他这样做的冒险/不幸事件。虽然这部电影的主题具有很高的潜力,但我认为它在演示中变得有点自命不凡,特别是在最后。相信你的眼睛的想法似乎是合适的,但它有其自身的缺点。进化训练我们的大脑取代我们的视野,做出分裂的第二个决定。许多视错觉都是这种现象的证据。我们将探讨一些视觉错觉/插图,以突出我们的大脑和眼睛在本文中的工作方式。但在此之前,让我们考虑一下分裂第二决策作为生存必需品的一个例子。

想象一下在黑暗中的人类祖先。我们的祖先很饿,他几天没有吃东西。他看到一个潜伏在他面前的生物的轮廓。这个生物可能是他的下一顿饭。另一方面,这可能是一种掠夺性生物,我们的祖先将成为这种生物的美味佳肴。人类仍然在这个星球上的原因是因为我们的祖先的眼睛和大脑已经创造了一些简单的规则来处理这种情况。进化为人类配备的仪器之一是.....

 

Power of Context

As promised earlier, let me present a couple of illustrations to emphasize the power of context. In the first of these illustrations (shown adjacent), try to compare the length of two yellow lines and decide which one is longer. In this case, you will most probably identify the top yellow line as longer than the bottom yellow line. In this illusion, your brain will supersede the information received through your eyes based on the context or surrounding patterns around the yellow lines. As you might appreciate our three-dimensional world will rarely, or most probably never, offer a pattern similar to the optical illusion of illustration 1. Hence, for most practical purposes our brain has made the right decision though may seem ridiculous in this case.

正如之前所承诺的那样,让我提出一些插图来强调语境的力量。在第一个插图(显示为相邻)中,尝试比较两条黄线的长度并确定哪一条更长。在这种情况下,您很可能将顶部黄线识别为比底部黄线长。在这种幻觉中,你的大脑将取代基于黄色线周围的环境或周围模式通过你的眼睛接收的信息。你可能会欣赏我们的三维世界很少,或者很可能永远不会提供类似于插图1的视错觉的模式。因此,对于大多数实际目的,我们的大脑做出了正确的决定,尽管在这种情况下可能看起来很荒谬。

Illustration 2 – Source: ‘Thinking Fast and Slow’ by Daniel Kahneman

Now, let us have a look at the second illustration as shown adjacent. Notice B and 13 in the middle of the top and the bottom sequences, they are identical. You read the top sequence ABC and the bottom sequence 12,13,14. This is phenomenal, what your brain has just done in a split second is something most text mining and artificial intelligence algorithms try to do painstakingly. I must point out, CAPTCHA is a proof that most of these algorithms fail to capture what nature has equipped us with – the ability to join missing links.

Our brain tries to fill the gap in our information using the available information. This is precisely what we try to do while using reject inference for credit scoring.

现在,让我们看看相邻的第二个插图。注意顶部和底部序列中间的B和13,它们是相同的。您阅读了顶部序列ABC和底部序列12,13,14。这是惊人的,你的大脑刚刚在瞬间完成的是大多数文本挖掘和人工智能算法试图做的苦心事。我必须指出,CAPTCHA证明了大多数这些算法无法捕捉大自然为我们提供的东西 - 加入缺失链接的能力。

我们的大脑试图利用现有信息填补我们信息的空白。这正是我们在使用拒绝推理进行信用评分时尝试做的事情。

 

正如您在上面的示意图中所看到的,我们有关于已支付贷款的信息,根据其业绩将其标记为好或坏。 但是,要为整个门户群体创建整体记分卡,我们需要推断被拒绝贷款的行为。 这种补充信息的过程称为拒绝推断,对于开发整体记分卡至关重要。 以下部分介绍了一些常用的执行拒绝推理的方法。 我还必须指出,尽管在工业中广泛使用,但以下方法并不完美。

 

 

Use Credit Bureaus

This method involves using information from credit bureaus to fill the gaps. If other lenders have disbursed loans to your rejected applicants then it makes sense to tag the rejected customers good or bad based on their performance with the other lenders. Although this method is possibly the best way to infer rejects with concrete information, it has the following challenges

  1. It unlikely that all the rejected loans have got a loan with some other lenders around the development period of the scorecard

  2. Difference in collection process and reporting among lenders could influence dubious tagging for customers’ performance

In most cases using credit bureaus information alone won’t be sufficient enough to tag the entire through-the-door population. That is why we need analytical methods for reject inference as discussed in the next segment.

使用信用局
这种方法涉及使用信用局的信息填补空白。 如果其他贷款人已向您被拒绝的申请人发放贷款,那么根据他们与其他贷方的表现来判断被拒绝的客户是好还是坏是有意义的。 虽然这种方法可能是用具体信息推断拒绝的最佳方法,但它存在以下挑战

所有被拒绝的贷款都不可能在记分卡的开发期间与其他一些贷款人一起获得贷款
贷款人收集流程和报告的差异可能会影响客户绩效的可疑标记
在大多数情况下,单独使用信用局信息不足以标记整个门户人口。 这就是我们需要分析方法进行拒绝推理的原因,如下一部分所述。

Augmentation through Parceling

Augmentation in different forms is the most commonly used methodology for reject inference. Now as shown in the above schematic we have fairly concrete tagging of good and bad loans for all the disbursed loans. We can easily run a classification algorithm like logistic regression (follow this link Part 3), neural nets or decision tree to create a Known-Good-Bad (KGB) model. The same KGB model is used to score the rejected loans. Once the scoring is completed the analyst could create a table similar to the one shown below:

通过Parceling进行扩充
不同形式的增强是最常用的拒绝推理方法。 现在如上图所示,我们对所有已发放贷款的好坏贷款进行了相当具体的标记。 我们可以轻松地运行分类算法,如逻辑回归(遵循此链接第3部分),神经网络或决策树来创建一个已知 - 良好 - 坏(KGB)模型。 同样的克格勃模型用于对被拒绝的贷款进行评分。 评分完成后,分析师可以创建一个类似于下图所示的表格:


Reject Inference

Let us try to understand the dynamics of the loan application process before establishing the necessity for reject inference. The ‘through-the-door’ loan applications are assessed by underwriters to establish the creditworthiness of the applicants. The underwriters will either accept or reject the applications based on the credentials of the applicants. Moreover, the customers with accepted applications will either avail the loans or not. This is shown in the schematic below:

拒绝推理
在确定拒绝推理的必要性之前,让我们试着了解贷款申请流程的动态。 审批人员对“通过'贷款申请进行评估,以确定申请人的信誉。审批人员将根据申请人的证书接受或拒绝申请。 此外,接受申请的客户将利用或不利用贷款。 这显示在下面的示意图中:


As you may notice in the above table, we have divided rejected applications into the same proportion of good / bad as in the disbursed loans for the score range. For instance, the score range of 232-241 has 22% bad loans. We have divided 2295 rejected applicants in this bucket into 505 (this is 22% of 2295) bad loans and 1790 good loans. We will randomly choose 505 rejected applications in the score range of 232-241 and assign them as bad loans (the remaining loans in this bucket will be assigned as good). Now we will create a holistic scorecard by re-running the classification algorithm i.e. logistic regression on the entire through-the-door population.

I hope you have noticed that we have used the principles of power-of-context discussed above by using score ranges as the criteria for augmentation.

正如您在上表中所注意到的那样,我们将被拒绝的申请分为与分数范围的已支付贷款相同的好/坏比例。 例如,232-241的得分范围有22%的不良贷款。 我们已将2295名被拒绝的申请人分为505(这是2295的22%)不良贷款和1790良好贷款。 我们将在232-241的分数范围内随机选择505个被拒绝的申请,并将其分配为不良贷款(此桶中的剩余贷款将被分配为好)。 现在我们将通过重新运行分类算法来创建整体记分卡,即对整个门户群体进行逻辑回归。

我希望你注意到我们使用得分范围作为扩充的标准,使用了上面讨论的上下文的原则。

Fuzzy Augmentation

A fuzzy augmentation is an extended form of parceling, here rather than randomly assigning loans as good and bad we will create multiple copies of rejected loans in the proportion of good / bad % in the score range. For instance, 22 copies of a single rejected loan in the score range of 232-241 will be tagged as bad and 78 copies as good. The process will be repeated for all the rejected loans. This is similar to the workings of fuzzy logic. Fuzzy augmentation is believed to be a superior method for reject inference to produce holistic scorecards.

模糊增强
模糊增值是一种扩展形式的分割,这里不是随意分配贷款的好坏我们将按分数范围内好/坏百分比的比例创建多个被拒绝贷款的副本。 例如,在分数范围232-241的单个被拒绝贷款的22份副本将被标记为坏,78份副本被标记为好。 所有被拒绝的贷款将重复该过程。 这类似于模糊逻辑的工作方式。 模糊增强被认为是拒绝推理以产生整体记分卡的优良方法。

Sign-off Note

I know all the above methods for reject inference have their shortcomings. I have seen several experts and academicians cringe at the mention of the above methods for reject inference. However thus far, these are the best methods we have for reject inference with our current knowledge of mathematics and logic. I must say, nature is still hiding a few brilliant tricks under her sleeves such as our own ability to decipher CAPTCHAs. Some day when we will learn more about the inner workings of our own brain we might crack the bigger code for reject inference and millions of similar problems. Nature does reveal herself in piecemeal so there is still tremendous hope!

签字笔记
我知道拒绝推理的所有上述方法都有其缺点。 我看到有几位专家和学者在提到拒绝推理的上述方法时感到畏缩。 然而到目前为止,这些是我们用现有的数学和逻辑知识进行拒绝推理的最佳方法。 我必须说,大自然仍然隐藏在她的袖子下的一些聪明的技巧,比如我们自己破解CAPTCHA的能力。 有一天,当我们将更多地了解我们自己大脑的内部运作时,我们可能会破解更大的拒绝推理和数百万类似问题的代码。 大自然确实零碎地展示自己,所以仍有巨大的希望!


up主微信公众号pythonEducation

博主网校主页 :http://dwz.date/bwes


博主网校主页


银行案例学习实例5_Reject Inference 拒绝引用的评论 (共 条)

分享到微博请遵守国家法律