欢迎光临散文网 会员登陆 & 注册

神经网络模型-预测药物靶点

2020-07-26 09:48 作者:python_biology  | 我要投稿

python机器学习-乳腺癌细胞挖掘(博主亲自录制视频)https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

up主录制

药物靶点

 药物与机体生物大分子的结合部位即药物靶点。药物作用靶点涉及受体、酶、离子通道、转运体、免疫系统、基因等。此外,有些药物通过其理化作用或补充机体所缺乏的物质而发挥作用。现有药物中,超过50%的药物以受体为作用靶点,受体成为最主要和最重要的作用靶点;超过20%的药物以酶为作用靶点,特别是酶抑制剂,在临床应用中具有特殊地位;6%左右的药物以离子通道为作用靶点;3%的药物以核酸为作用靶点;20%药物的作用靶点尚有待进一步研究。

机器学习预测药物靶点意义

药物靶标识别是现代新药研发的关键,它在药物毒副作用研究、老药新用以及个体化治疗中都起着十分重要的作用。然而,受到精度、通量和成本的制约,基于生物实验的传统药物靶标识别方法通常难以展开。与此同时,随着信息科学的迅猛发展,机器学习、模式识别、数据挖掘等智能计算技术在生物计算领域得到了广泛的应用。在这些技术的推动下,计算机辅助的药物-靶标相互作用预测方法作为一种快速而准确的药物靶标识别手段,受到越来越多研究者的重视。它能够利用计算机的模拟、运算和预测技术研究药物化合物分子与靶标蛋白质之间的关系,指导合成新的药物或修饰已知的药物结构,从而缩短新药研制时间,减少新药研制的盲目性并降低研发成本。因此,作为一种高效而低成本的方法,基于智能计算的药物-靶标相互作用预测对于靶标蛋白确认、靶向性药物开发以及药物-靶标相互作用网络构建都具有十分重要的意义。


下面我用神经网络算法建立预测药物靶点的模型

下面是python语言建立模型代码



# -*- coding: utf-8 -*-

"""

Created on Wed Sep  5 11:23:58 2018

 

@author: 231469242@qq.com;<br>微信公众号:pythonEducation

数据源与说明文档

https://www.wildcardconsulting.dk/useful-information/a-deep-tox21-neural-network-with-rdkit-and-keras/

"""

import pandas as pd

import numpy as np

  

#RDkit for fingerprinting and cheminformatics

from rdkit import Chem, DataStructs

from rdkit.Chem import AllChem, rdMolDescriptors

  

#MolVS for standardization and normalization of molecules

import molvs as mv

#Function to get parent of a smiles

def parent(smiles):

 st = mv.Standardizer() #MolVS standardizer

 try:

  mols = st.charge_parent(Chem.MolFromSmiles(smiles))

  return Chem.MolToSmiles(mols)

 except:

  print "%s failed conversion"%smiles

  return "NaN"

  

#Clean and standardize the data

def clean_data(data):

 #remove missing smiles

 data = data[~(data['smiles'].isnull())]

  

 #Standardize and get parent with molvs

 data["smiles_parent"] = data.smiles.apply(parent)

 data = data[~(data['smiles_parent'] == "NaN")]

  

 #Filter small fragents away

 def NumAtoms(smile):

  return Chem.MolFromSmiles(smile).GetNumAtoms()

  

 data["NumAtoms"] = data["smiles_parent"].apply(NumAtoms)

 data = data[data["NumAtoms"] > 3]

 return data

  

#Read the data

data = pd.DataFrame.from_csv('tox21_10k_data_all_pandas.csv')

valdata = pd.DataFrame.from_csv('tox21_10k_challenge_test_pandas.csv')

testdata = pd.DataFrame.from_csv('tox21_10k_challenge_score_pandas.csv')

 

data = clean_data(data)

valdata = clean_data(valdata)

testdata = clean_data(testdata)

 

#Calculate Fingerprints

def morgan_fp(smiles):

 mol = Chem.MolFromSmiles(smiles)

 fp = AllChem.GetMorganFingerprintAsBitVect(mol,3, nBits=8192)

 npfp = np.array(list(fp.ToBitString())).astype('int8')

 return npfp

  

fp = "morgan"

data[fp] = data["smiles_parent"].apply(morgan_fp)

valdata[fp] = valdata["smiles_parent"].apply(morgan_fp)

testdata[fp] = testdata["smiles_parent"].apply(morgan_fp)

#Choose property to model

prop = 'SR-MMP'

  

#Convert to Numpy arrays

X_train = np.array(list(data[~(data[prop].isnull())][fp]))

X_val = np.array(list(valdata[~(valdata[prop].isnull())][fp]))

X_test = np.array(list(testdata[~(testdata[prop].isnull())][fp]))

  

#Select the property values from data where the value of the property is not null and reshape

y_train = data[~(data[prop].isnull())][prop].values.reshape(-1,1)

y_val = valdata[~(valdata[prop].isnull())][prop].values.reshape(-1,1)

y_test = testdata[~(testdata[prop].isnull())][prop].values.reshape(-1,1)

 

#Set network hyper parameters

l1 = 0.000

l2 = 0.016

dropout = 0.5

hidden_dim = 80

  

#Build neural network

model = Sequential()

model.add(Dropout(0.2, input_shape=(X_train.shape[1],)))

for i in range(3):

 wr = WeightRegularizer(l2 = l2, l1 = l1)

 model.add(Dense(output_dim=hidden_dim, activation="relu", W_regularizer=wr))

 model.add(Dropout(dropout))

wr = WeightRegularizer(l2 = l2, l1 = l1)

model.add(Dense(y_train.shape[1], activation='sigmoid',W_regularizer=wr))

  

##Compile model and make it ready for optimization

model.compile(loss='binary_crossentropy', optimizer = SGD(lr=0.005, momentum=0.9, nesterov=True), metrics=['binary_crossentropy'])

#Reduce lr callback

reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.5,patience=50, min_lr=0.00001, verbose=1)

  

#Training

history = model.fit(X_train, y_train, nb_epoch=1000, batch_size=1000, validation_data=(X_val,y_val), callbacks=[reduce_lr])

#Plot Train History

def plot_history(history):

    lw = 2

    fig, ax1 = plt.subplots()

    ax1.plot(history.epoch, history.history['binary_crossentropy'],c='b', label="Train", lw=lw)

    ax1.plot(history.epoch, history.history['val_loss'],c='g', label="Val", lw=lw)

    plt.ylim([0.0, max(history.history['binary_crossentropy'])])

    ax1.set_xlabel('Epochs')

    ax1.set_ylabel('Loss')

    ax2 = ax1.twinx()

    ax2.plot(history.epoch, history.history['lr'],c='r', label="Learning Rate", lw=lw)

    ax2.set_ylabel('Learning rate')

    plt.legend()

    plt.show()

  

plot_history(history)

 

 

def show_auc(model):

    pred_train = model.predict(X_train)

    pred_val = model.predict(X_val)

    pred_test = model.predict(X_test)

  

    auc_train = roc_auc_score(y_train, pred_train)

    auc_val = roc_auc_score(y_val, pred_val)

    auc_test = roc_auc_score(y_test, pred_test)

    print "AUC, Train:%0.3F Test:%0.3F Val:%0.3F"%(auc_train, auc_test, auc_val)

  

    fpr_train, tpr_train, _ =roc_curve(y_train, pred_train)

    fpr_val, tpr_val, _ = roc_curve(y_val, pred_val)

    fpr_test, tpr_test, _ = roc_curve(y_test, pred_test)

  

    plt.figure()

    lw = 2

    plt.plot(fpr_train, tpr_train, color='b',lw=lw, label='Train ROC (area = %0.2f)'%auc_train)

    plt.plot(fpr_val, tpr_val, color='g',lw=lw, label='Val ROC (area = %0.2f)'%auc_val)

    plt.plot(fpr_test, tpr_test, color='r',lw=lw, label='Test ROC (area = %0.2f)'%auc_test)

  

    plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')

    plt.xlim([0.0, 1.0])

    plt.ylim([0.0, 1.05])

    plt.xlabel('False Positive Rate')

    plt.ylabel('True Positive Rate')

    plt.title('Receiver operating characteristic of %s'%prop)

    plt.legend(loc="lower right")

    plt.interactive(True)

    plt.show()

show_auc(model)

#Compare with a Linear model

from sklearn import linear_model

#prepare scoring lists

fitscores = []

predictscores = []

##prepare a log spaced list of alpha values to test

alphas = np.logspace(-2, 4, num=10)

##Iterate through alphas and fit with Ridge Regression

for alpha in alphas:

  estimator = linear_model.LogisticRegression(C = 1/alpha)

  estimator.fit(X_train,y_train)

  fitscores.append(estimator.score(X_train,y_train))

  predictscores.append(estimator.score(X_val,y_val))

  

#show a plot

import matplotlib.pyplot as plt

ax = plt.gca()

ax.set_xscale('log')

ax.plot(alphas, fitscores,'g')

ax.plot(alphas, predictscores,'b')

plt.xlabel('alpha')

plt.ylabel('Correlation Coefficient')

plt.show()

  

estimator= linear_model.LogisticRegression(C = 1)

estimator.fit(X_train,y_train)

#Predict the test set

y_pred = estimator.predict(X_test)

print roc_auc_score(y_test, y_pred)

结论:神经网络算法效果不错,验证数据的AUC达到0.78,但模型有过度拟合,需要调参或尝试其他算法。欢迎各位学员参加我的python机器学习生物信息学系列课,网址为:http://dwz.date/b9vw

up主录制


神经网络模型-预测药物靶点的评论 (共 条)

分享到微博请遵守国家法律