信用评分卡Credit Scorecards （4-5）

2020-07-21 09:18 作者:python风控模型 0人读过 | 我要投稿

up主微信公众号pythonEducation

Credit Scorecards – Advanced Analytics (part 4 of 7)信用评分卡

http://ucanalytics.com/blogs/credit-scorecards-advanced-analytics-part-4/

Modeling in Advanced Analytics模型中高级分析

Advanced Analytics: Model Development – by Roopam

The room, full of Analysts, erupts with a loud round of laughter when a young business analyst narrates to us an incident from his recent trip back home. A distant aunt inquired about his new profession. His response – I am into modeling. She got all excited and asked – is it just on the ramp or will I see you on the television? Jokes apart, this left me wondering about the roots of the word modeling or model. What is a model?

A model is defined as a simplified representation of reality. A representation of reality, hmmm, a photograph is a representation of reality – a moment of reality capture on the reel – does that makes it into a model. I think yes. Similarly, a newspaper reporter covering an incident and makes it into breaking news is also a model – a descriptive model. Now, let us try to link models with Analytics.

当一位年轻的商业分析师向我们讲述他最近回家的事件时，充满分析师的房间爆发出一阵响亮的笑声。一位遥远的阿姨询问了他的新职业。他的回答 - 我正在进行建模。她兴高采烈地问道 - 它只是在坡道上还是我会在电视上看到你？开玩笑，这让我想知道建模或模型这个词的根源。什么是模特？

模型被定义为现实的简化表示。现实的表现，嗯，照片是现实的代表 - 在卷轴上捕捉现实的瞬间 - 这使它成为一个模型。我想是的。同样，报道一个事件并将其作为突发新闻的报纸记者也是一个模型 - 描述性模型。现在，让我们尝试将模型与Google Analytics相关联。

Data warehouse, Business Intelligence and Advanced Analytics

Analytics has received a massive boost because of the emergence of information technology. We are living in the era of big data. A plethora of data collected at every stage of the business process had created a need to extract knowledge out of the information. This overall process has three aspects to it

1. Data warehouse or data marts: transactional data is extracted-transformed and loaded (ETL) into a data model / schema for the purpose of analysis
2. Business Intelligence or dashboards: “as is” business reports
3. Predictive Analytics or Advanced Analytics: high-end statistical and data mining exercise

As the quantum of data is exponentially increasing, Hadoop and big data technologies are replacing the data warehouses. However, the thought process for business intelligence and predictive analytics – the focus of this article – will not change much. Let me try to distinguish between business intelligence and predictive Analytics using something I learned at a professional theater.

1.数据仓库或数据集市：事务数据被提取 - 转换和加载（ETL）到数据模型/模式中以进行分析
2.商业智能或仪表板：“按原样”业务报告
3.预测分析或高级分析：高端统计和数据挖掘练习

随着数据量的呈指数增长，Hadoop和大数据技术正在取代数据仓库。但是，商业智能和预测分析的思维过程 - 本文的重点 - 不会发生太大变化。让我尝试使用我在专业剧院学到的东西来区分商业智能和预测分析

5Ws for business intelligence & predictive Analytics – Lessons from Theater

5 Ws for Data Warehouse, Business Intelligence, and Advanced Analytics – by Roopam

I joined a professional theater group a few years ago. To understand the nuances of acting we started with improv or improvisation theater. This form of theater does not have a predefined script but the actors built the story while performing. Most people thought I was a good improv actor. However, the style of remembering dialogue while performing did not work very well for me and hence it was the end of my theater gig. However, I learn some good lessons from the whole experience. One of them was the five-Ws of deciphering a character to build the drama.

1. What had happened?
2. When did it happen?
3. Where did it happen?
4. Who was part of this?
5. Why did it happen?

Clearly, the first four questions are trying to report an as-is version of the reality – a descriptive model. This is exactly what the business intelligence professionals try to achieve through the fancy reporting platforms & software. The fifth question is the trickiest of the lot. The question that keeps scientists and inquisitive minds awake late at night.

几年前我加入了一个专业剧团。为了理解表演的细微差别，我们从即兴剧或即兴剧开始。这种形式的剧院没有预定义的剧本，但演员在表演时建立了故事。大多数人都认为我是一个很好的即兴演员。然而，在表演时记住对话的风格对我来说并不是很好，因此它是我戏剧演出的结束。但是，我从整个经历中学到了一些好的教训。其中一个是解读一个角色来制作戏剧的五个W.

1.发生了什么事？
2.什么时候发生的？
3它发生在哪里？
4谁是这个的一部分？
5.为什么会这样？

显然，前四个问题试图报告现实的现实版本 - 描述性模型。这正是商业智能专业人员试图通过花哨的报告平台和软件实现的目标。第五个问题是最棘手的问题。让科学家和好奇的头脑在深夜醒来的问题。

Newton’s Legacy

An apple falls from a tree. How difficult is it to answer the first four questions? Most of us can answer them with a help of a clock and a map. However, Isaac Newton answered the fifth question and his answer – Gravity. If he had stopped there, nobody would have remembered him after close to four hundred years since his birth. He gave a mathematical model to explain this phenomenon.

Replace apple and earth with any other objects and you have the general equation for the model. Albert Einstein did shatter the Newtonian notion of Gravity. However, this model still holds good for all problems of practical purposes and used extensively in rocket science.

Advanced analytics tries to facilitate the answer to the fifth question of why did something happen using predictive modeling. The combination of high-end statistical and data mining techniques along with analysts’ business acumen produces models that help organizations make informed decisions. Remember, this is just the beginning and causality is still a fair distance!

一棵苹果从树上掉下来。回答前四个问题有多难？我们大多数人都可以借助时钟和地图来回答这些问题。然而，Isaac Newton回答了第五个问题和他的回答 - Gravity。如果他已经停在那里，那么在他出生后近四百年后，没有人会想起他。他给出了一个数学模型来解释这种现象。

4重力

用任何其他物体替换苹果和地球，你就可以得到模型的一般公式。阿尔伯特爱因斯坦确实粉碎了牛顿的重力概念。然而，这种模型仍然适用于所有实际问题，并广泛用于火箭科学。

高级分析试图通过预测建模来回答第五个问题，即为什么会发生某些事情。高端统计和数据挖掘技术与分析师的商业敏锐度相结合，可以生成帮助组织做出明智决策的模型。请记住，这只是一个开始，因果关系仍然是一个公平的距离

Credit Scoring Models

Credit scorecards are models to predict the probability of a borrower default on his/her loan. The following is a simplified version of credit score with three variables

Credit Score = Age + Loan to Value Ratio (LTV) + Installment (EMI) to Income Ratio (IIR)

信用记分卡是预测借款人违约贷款概率的模型。以下是具有三个变量的信用评分的简化版本

信用评分=年龄+贷款与价值比率（LTV）+分期付款（EMI）与收入比率（IIR）

贷款价值比，英文loan to value,简写LTV，指贷款金额和抵押品价值的比例，多见于抵押贷款，如房产抵押贷款。

如某客户A的房产抵押贷款，抵押房产估值为100万人民币，而银行的信贷政策规定LTV<70%,银行最多可以贷给A客户70万元的贷款。

不同的抵押品贷款的LTV根据银行自身政策，各不相同。反映银行对抵押物的风险预期！

A 28-year-old man with the LTV of 75 and the IIR of 60 will have the score of 10+50+5 =65 and hence is a high credit risk.
一名28岁男子的LTV为75，IIR为60，他的得分为10 + 50 + 5 = 65，因此信用风险很高。

Classification of good & bad loans using two variables – LTV & IIR – by Roopam

Now the question is, how did we arrive at the bucket-wise score points and associated risk tables? By now, after going through the previous three articles of the series, you must have some idea how we will go about it. We have a historical list of good / bad borrowers (article 2) that we want to distinguish using predictor variables (article 3). There are several statistical & data mining techniques that could help us achieve our object such as

1. Decision tree
2. Neural Networks
3. Support Vector Machines
4. Probit Regression
5. Linear discriminant analysis
6. Logistic Regression

Logistic regression is the most commonly used technique for the purpose. We will explore more about logistic regression in the next article.

Sign-off Note

I must conclude this article by saying that the good analysts find a good mathematical model as beautiful as the model walking on the catwalk ramp.

现在的问题是，我们是如何得出存储分数和相关风险表的？到目前为止，在完成系列的前三篇文章之后，你必须知道我们将如何去做。我们有一个好/坏借款人的历史清单（第2条），我们希望使用预测变量来区分（第3条）。有几种统计和数据挖掘技术可以帮助我们实现我们的目标，例如

1.决策树
2.神经网络
3.支持向量机
4.概率回归
5.线性判别分析
6. Logistic回归

Logistic回归是最常用的技术。我们将在下一篇文章中探讨有关逻辑回归的更多信息。

签字笔记
我必须在结束本文时说，优秀的分析师找到了一个很好的数学模型，就像模特走在T台上一样漂亮。

Credit Scorecards – Logistic Regression (part 5 of 7)逻辑回归

http://ucanalytics.com/blogs/credit-scorecards-logistic-regression-part-5/

A Primer on Logistic Regression – Are you Happy?

Logistic regression for happiness- by Roopam

A few years ago, my wife and I took a couple of weeks’ vacation to England and Scotland. Just before boarding the British Airway’s plane, an air-hostess informed us that we were upgraded to business class. Jolly good! What a wonderful start to the vacation. Once we got onto to the plane, we got another tempting offer for a further upgrade to the first class. However, this time, there was a catch – just one seat was available. Now that is a shame, of course, we could not take this offer. The business class seats were fabulous before the first class offer came – by the way, all free upgrades. This is the situation behavioral economist describe as relativity & anchoring – in plain English comparison. Anchoring or comparison is at the root of pricing strategies in business and also to all the human sorrow. However, eventually the vacation mood took over and we enjoyed the business class thoroughly. Humans are phenomenally good at adjusting to the situation in the end and enjoy it as well. You will find some of the happiest faces with people in the most difficult situations. Here is a quote by Henry Miller “I have no money, no resources, no hopes. I am the happiest man alive”. Human behavior is full of anomaly – full of puzzles. The following is an example to strengthen this thesis.

几年前，我和妻子在英格兰和苏格兰度过了几个星期的假期。就在登上英国航空公司的飞机之前，一名空姐告诉我们，我们已升级为商务舱。快乐！度假真是一个美好的开始。一旦我们登上飞机，我们又获得了另一个诱人的提议，可以进一步升级到头等舱。然而，这一次，有一个问题 - 只有一个座位可用。当然，这是一种耻辱，我们无法接受这个提议。在提供头等舱优惠之前，商务舱座位非常棒 - 顺便说一下，所有免费升级。这是行为经济学家描述为相对论和锚定的情况 - 用简单的英语比较。锚定或比较是企业定价策略的根源，也是所有人类悲伤的根源。然而，最终度假心情接管了，我们彻底享受了商务舱。人类在适应最终情况方面非常擅长并享受它。在最困难的情况下，你会发现一些最快乐的面孔。以下是亨利米勒的一句话：“我没有钱，没有资源，没有希望。我是最幸福的人“。人类的行为充满了异常 - 充满了谜题。以下是加强本论文的一个例子

列侬，麦卡特尼，哈里森和贝斯特是这个星球上最着名的乐队 - 甲壳虫乐队的成员。好的，我知道你发现了这个错误。到现在为止，你必须说出正确的名字：John Lennon，Paul McCartney，George Harrison和Ringo Starr，而不是Pete Best。实际上，Ringo Starr是Pete Best的替代品，Pete Best是甲壳虫乐队的原始常规鼓手。皮特一定是被摧毁了，看到他的伙伴们在落后的时候冉冉升起。错了，在Google上搜索他 - 他是所有人中最快乐的披头士乐队。现在这是违反直觉的，我想我们不知道是什么让我们开心。

正如在前一篇文章中所承诺的那样，在本文中，我将尝试使用逻辑回归来探索幸福 - 这种技术广泛用于记分卡开发。

Source: flicker.com

Lennon, McCartney, Harrison, and Best are the members of the most famous band ever on the planet – the Beatles. Ok, I know you have spotted the error. By now your must have uttered out the right names: John Lennon, Paul McCartney, George Harrison and Ringo Starr not Pete Best. Actually, Ringo Starr was the replacement for Pete Best, the original regular drummer for the Beatles. Pete must have been devastated seeing his partners rising to glory while he was left behind. Wrong, search for him on Google – he is the happiest Beatle of all. Now that is counter intuitive, I guess we do not have a clue what makes us happy.

As promised in a previous article, in this article I will attempt to explore happiness using logistic regression – the technique extensively used in scorecard development.

我是一位彻底的经验主义者 - 支持基于事实的管理。因此，让我设计一个快速而肮脏的实验*来生成数据来评估幸福感。我们的想法是确定影响我们整体幸福感的因素/变量。让我列出一个生活在城市中的工作成年人的代表性因素列表：

Logistic Regression – An Experiment

I am a thorough empiricist – a proponent of fact-based management. Hence, let me design a quick and dirty experiment* to generate data to evaluate happiness. The idea is to identify the factors / variables that influence our overall happiness. Let me present a representative list of factors for a working adult living in a city:

Now, throw in some other factors to the above list such as – random act of kindness or an unplanned visit to a friend. As you could see, the above list can easily be expanded (recall the article on variable selection- article 3). This is a representative list and you will have to create your own to figure out factors that influence your level of happiness.

The second part of the experiment is to collect data. This is like maintaining a diary only this one will be in Microsoft Excel. Every night before sleeping, you could assess your day and fill up numbers in the Spreadsheet along with your overall level of happiness for the day (as shown in the figure below).

*I am calling this a quick and dirty experiment for the following reasons (1) It’s not a well thought out experiment but is created more to illustrate how logistic regression works (2) the observer and the observed are same in this experiment which might create a challenge for objective measurement.

After a couple of years of data collection, you will have enough observations to create a model – a logistic regression model in this case. We are trying to model feeling of happiness (column B) with other columns (C to I) in the above data set. If we plot B on the Y-axis and the additive combination of C to I (we’ll call it Z) on the X-axis it will look something like the plot shown below.

The idea behind logistic regression is to optimize Z in such a way that we get the best possible distinction between happy and sad faces, as achieved in the plot above. This is a curve-fitting problem with sigmoid function (the curve in violet) as the choice of function.

I would recommend using dates of observations (column A) in our model; this might give an interesting influence of seasons on our mood.

逻辑回归背后的想法是以这样的方式优化Z，使得我们在快乐和悲伤面孔之间得到最佳区分，如上图所示。这是一个曲线拟合问题，其中sigmoid函数（紫色曲线）作为函数的选择。

我建议在我们的模型中使用观察日期（A栏）; 这可能会给季节带来有趣的影响。

Applications in Banking and Finance

This is exactly what we do in case of analytical scorecards such as credit scorecards, behavioral scorecards, fraud scorecards or buying propensity models. Just replace happy and sad faces with …

• Good and Bad borrowers
• Fraud and genuine cases
• Buyers and non-buyers

…. for the respective cases and you have the model. If you remember in the previous article (4), I have shown a simple credit scorecard model: Credit Score = Age + Loan to Value Ratio (LTV) + Instalment (EMI) to Income Ratio (IIR)

A straightforward transformation of the sigmoid function will help us arrive at the above equation of the line. This is the final link to arrive at the desired scorecard.

Variable Transformation in Credit Scorecards

The Swordsmith – by Roopam

I loved the movie Kill-Bill, both parts. In the first part, I enjoyed when Uma Thurman’s character went to Japan to get a sword from Hattori Hanzō, the legendary swordsmith. After learning about her motive, he agrees to make his finest sword for her. Then Quentin Tarantino, director of the movie, briefly showed the process of making the sword. Hattori Hanzō transformed a regular piece of iron to the fabulous sword – what a craftsman. This is fairly similar to how analysts perform transformation of the sigmoid function to the linear equation. The difference is that analysts use mathematical tools rather than hammers and are not as legendary as Hattori Hanzō.

我喜欢电影Kill-Bill这两部分。在第一部分中，当Uma Thurman的角色去日本从传说中的剑士HattoriHanzō手中拿剑时，我很享受。在了解了她的动机之后，他同意为她做出最好的剑。然后电影导演昆汀·塔伦蒂诺（Quentin Tarantino）简要介绍了制作剑的过程。 HattoriHanzō将一块普通的铁片变成了神话般的剑 - 这真是一个工匠。这与分析师如何将S形函数转换为线性方程非常相似。不同之处在于，分析师使用数学工具而不是锤子，并不像HattoriHanzō那样具有传奇色彩。

Reject Inference

Reject inference is a distinguishing aspect about credit or application scorecards which is different from all other classification models. For the application scorecards, the development sample is biased because of the absence of performance for rejected loans. Reject inference is a way to rectify this shortcoming and removing the bias from the sample. We will discuss reject inference in detail in some later article on YOU CANalytics.

拒绝推断是信用或应用记分卡的一个显着方面，它与所有其他分类模型不同。对于应用记分卡，由于拒绝贷款缺乏绩效，开发样本存在偏差。拒绝推断是一种纠正这一缺点并消除样本偏差的方法。我们将在后面有关您的CANalytics的文章中详细讨论拒绝推断。

Sign-off Note

Now that we have our scorecard ready the next task is to validate the predictive power of the scorecard. This is precisely what we will do in the next article. See you soon.

博主网校主页： http://dwz.date/bwes

标签：

信用评分卡Credit Scorecards （4-5）

Credit Scorecards – Advanced Analytics (part 4 of 7)信用评分卡

Modeling in Advanced Analytics模型中高级分析

Data warehouse, Business Intelligence and Advanced Analytics

5Ws for business intelligence & predictive Analytics – Lessons from Theater

Newton’s Legacy

Credit Scoring Models

Sign-off Note

Credit Scorecards – Logistic Regression (part 5 of 7)逻辑回归

A Primer on Logistic Regression – Are you Happy?

Logistic Regression – An Experiment

Applications in Banking and Finance

Variable Transformation in Credit Scorecards

Reject Inference

Sign-off Note