逻辑回归原理和实战1--揭秘美国挑战者号飞船事故真相

2020-07-23 14:55 作者:python风控模型 0人读过 | 我要投稿

python金融风控评分卡模型和数据分析微专业课：http://dwz.date/b9vv

挑战者号航天飞机于美国东部时间1986年1月28日上午11时39分（格林尼治标准时间16时39分）发射在美国佛罗里达州的上空。挑战者号航天飞机升空后，因其右侧固体火箭助推器（SRB）的O型环密封圈失效，毗邻的外部燃料舱在泄漏出的火焰的高温烧灼下结构失效，使高速飞行中的航天飞机在空气阻力的作用下于发射后的第73秒解体，机上7名宇航员全部罹难。挑战者号的残骸散落在大海中，后来被远程搜救队打捞了上来。

这次灾难性事故导致美国的航天飞机飞行计划被冻结了长达32个月之久。在此期间，美国总统罗纳德·里根委派罗杰斯委员会对该事故进行调查。罗杰斯委员会发现，美国国家航空航天局（NASA）的组织文化与决策过程中的缺陷与错误是导致这次事件的关键因素。NASA的管理层事前已经知道承包商莫顿·塞奥科公司设计的固体火箭助推器存在潜在的缺陷，但未能提出改进意见。他们也忽视了工程师对于在低温下进行发射的危险性发出的警告，并未能充分地将这些技术隐患报告给他们的上级。罗杰斯委员会向NASA提出了9项建议，并要求NASA在继续航天飞机飞行计划前贯彻这些建议。

这次事故真实原因是什么呢？真的是低温下进行发射引起的吗？up主用逻辑回归模型为大家揭秘历史真实。

首先我们要了解什么是逻辑回归模型。我们有一组数据，包含气温和事故。我们要用这些数据建立逻辑回归分类器模型。

当构建一个二元分类器时，很多实践者会立即跳转到逻辑回归，因为它很简单。但是，很多人也忘记了逻辑回归是一种线性模型，预测变量间的非线性交互需要手动编码。回到欺诈检测问题，要获得好的模型性能，像“billing address = shipping address and transaction amount < $50”这种高阶交互特征是必须的。因此，每个人都应该选择适合高阶交互特征的带核SVM或基于树的分类器。

概率定义：可能发生事件数量/所有事件数量

odd表示发生概率/不发生概率

odd ratio（两个odd值相比较）

警告：odd和概率是两个不同概念

逻辑回归就是线性的伯努利函数

公式用对数函数处理

逻辑回归是计算分类变量概率

二进制数据（分类数据）不呈现正态分布，如果遇到极端的x取值，y预测概率可能偏差较大

对数函数可视化

对数函数里，0-1取值范围在x轴，但我们想要概率到y轴，所以我们去对数函数的反函数

逻辑回归公式

信用得分增加对应得odd概率增加

odd ratio增加可视化图

python脚本实现

数据下载地址：

https://archive.ics.uci.edu/ml/datasets/Challenger+USA+Space+Shuttle+O-Ring

logisticRegression脚本代码

#231469242@qq.com

#微信公众号：pythonEducation

# -*- coding: utf-8 -*-

'''

GLM是广义线性模型的一种

Logistic Regression

A logistic regression is an example of a "Generalized Linear Model (GLM)".

The input values are the recorded O-ring data from the space shuttle launches before 1986,

and the fit indicates the likelihood of failure for an O-ring.

Taken from http://www.brightstat.com/index.php?option=com_content&task=view&id=41&Itemid=1&limit=1&limitstart=2

'''

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

import seaborn as sns

from statsmodels.formula.api import glm

from statsmodels.genmod.families import Binomial

sns.set_context('poster')

def getData():

'''Get the data '''

inFile = 'challenger_data.csv'

data = np.genfromtxt(inFile, skip_header=1, usecols=[1, 2],

missing_values='NA', delimiter=',')

# Eliminate NaNs

data = data[~np.isnan(data[:, 1])]

return data

def prepareForFit(inData):

''' Make the temperature-values unique, and count the number of failures and successes.

Returns a DataFrame'''

# Create a dataframe, with suitable columns for the fit

df = pd.DataFrame()

df['temp'] = np.unique(inData[:,0])

df['failed'] = 0

df['ok'] = 0

df['total'] = 0

df.index = df.temp.values

# Count the number of starts and failures

for ii in range(inData.shape[0]):

curTemp = inData[ii,0]

curVal = inData[ii,1]

df.loc[curTemp,'total'] += 1

if curVal == 1:

df.loc[curTemp, 'failed'] += 1

else:

df.loc[curTemp, 'ok'] += 1

return df

def logistic(x, beta, alpha=0):

''' Logistic Function '''

return 1.0 / (1.0 + np.exp(np.dot(beta, x) + alpha))

def showResults(challenger_data, model):

''' Show the original data, and the resulting logit-fit'''

temperature = challenger_data[:,0]

failures = challenger_data[:,1]

# First plot the original data

plt.figure()

setFonts()

sns.set_style('darkgrid')

np.set_printoptions(precision=3, suppress=True)

plt.scatter(temperature, failures, s=200, color="k", alpha=0.5)

plt.yticks([0, 1])

plt.ylabel("Damage Incident?")

plt.xlabel("Outside Temperature [F]")

plt.title("Defects of the Space Shuttle O-Rings vs temperature")

plt.tight_layout

# Plot the fit

x = np.arange(50, 85)

alpha = model.params[0]

beta = model.params[1]

y = logistic(x, beta, alpha)

plt.hold(True)

plt.plot(x,y,'r')

plt.xlim([50, 85])

outFile = 'ChallengerPlain.png'

showData(outFile)

if __name__ == '__main__':

inData = getData()

dfFit = prepareForFit(inData)

# fit the model

# --- >>> START stats <<< ---

model = glm('ok + failed ~ temp', data=dfFit, family=Binomial()).fit()

# --- >>> STOP stats <<< ---

print(model.summary())

showResults(inData, model)

最后看到低温时，飞船发生事故较多；高温时飞船事故较少。这也验证了挑战者飞船事故可能就是源于低温发射。当然这只是从一组数据推测的，如果有其他数据，也许会得到其他结论。

python机器学习生物信息学系列课（博主录制）：http://dwz.date/b9vw

标签：

逻辑回归原理和实战1--揭秘美国挑战者号飞船事故真相

逻辑回归原理和实战1--揭秘美国挑战者号飞船事故真相的评论 (共条)

你可能也喜欢这些文章

最新发布的文章

逻辑回归原理和实战1--揭秘美国挑战者号飞船事故真相

本文作者的其他文章

逻辑回归原理和实战1--揭秘美国挑战者号飞船事故真相的评论 (共 条)

你可能也喜欢这些文章

最新发布的文章

逻辑回归原理和实战1--揭秘美国挑战者号飞船事故真相的评论 (共条)