随机逻辑回归random logistic regression-特征筛选

随机逻辑回归random logistic regression可用于特征筛选。
随机Logistic回归的工作原理是对训练数据进行二次采样并拟合L1惩罚LogisticRegression模型,在该模型中,对系数的随机子集的损失进行了缩放。 通过多次执行此双重随机化,该方法将高分分配给在随机化过程中反复选择的特征。 这称为稳定性选择。 简而言之,经常选择的功能被认为是良好的功能。

下面直接列出代码
Scikit_Learn API :
sklearn.linear_model 广义线性模型
sklearn.linear_model.LogisticRegression Logistic 回归分类器
Methods:
score(X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels
Parameters:
:x:array-like, Test samples; y: array-like, True labels for X.
sample_weight:可选项,样本权重
Returns:
score: float, Mean accuracy of self.predict(X) wrt. y 获取各个特征的分数
sklearn.linear_model.RandomizedLogisticRegression 随机逻辑回归
官网对于随机逻辑回归的解释:
Randomized Logistic Regression works by subsampling the training data and fitting a L1-penalized LogisticRegression model where the penalty of a random subset of coefficients has been scaled. By performing this double randomization several times, the method assigns high scores to features that are repeatedly selected across randomizations. This is known as stability selection. In short, features selected more often are considered good features.
解读:对训练数据进行多次采样拟合回归模型,即在不同的数据子集和特征子集上运行特征算法,不断重复,最终选择得分高的重要特征。这是稳定性选择方法。得分高的重要特征可能是由于被认为是重要特征的频率高(被选为重要特征的次数除以它所在的子集被测试的次数)
