欢迎光临散文网 会员登陆 & 注册

【ROSALIND】【练Python,学生信】31 转换与颠换

2020-01-21 16:47 作者:未琢  | 我要投稿

如果第一次阅读本系列文档请先移步阅读【ROSALIND】【练Python,学生信】00 写在前面  谢谢配合~

题目:

转换与颠换(Transitions and Transversions)

Given: Two DNA strings s1 and s2 of equal length (at most 1 kbp).

所给:两条不超过1kb长的DNA序列s1和s2。

Return: The transition/transversion ratio R(s1,s2).

需得:转换与颠换频率的比值R(s1,s2)。

 

测试数据

>Rosalind_0209

GCAACGCACAACGAAAACCCTTAGGGACTGGATTATTTCGTGATCGTTGTAGTTATTGGA

AGTACGGGCATCAACCCAGTT

>Rosalind_2200

TTATCTGACAAAGAAAGCCGTCAACGGCTGGATAATTTCGCGATCGTGCTGGTTACTGGC

GGTACGAGTGTTCCTTTGGGT

测试输出

1.21428571429

 

生物学背景

        点突变包括两种类型:转换(transition)和颠换(transversion)。转换是嘌呤与嘌呤,或嘧啶与嘧啶之间的替换,即A与G,T与C之间的替换;颠换则是嘌呤与嘧啶之间的替换。简单来说,转换不改变碱基的种类,颠换会改变。如下图:

        因为颠换的改变更为剧烈,所以发生的频率更低。在基因组中,转换与颠换频率的比值约为2。在蛋白编码区,这个比值可以超过3,因为相对于颠换,转换不容易改变密码子编码的氨基酸。也因为这个原因,转换与颠换频率的比值可以帮我们鉴定蛋白编码区。

 

数学背景

        将序列发生转换与颠换的次数相比则得到所求比值。

 

思路

        本题思路很简单,只需比较两序列,记录转换和颠换的次数,相比即可。

 

代码

def readfasta(lines):
   
"""读入fasta格式文件的函数"""
    seq = []
    index = []
    seqplast =
""
   
numlines = 0
   
for i in lines:
       
if '>' in i:
            index.append(i.replace(
"\n", "").replace(">", ""))
            seq.append(seqplast.replace(
"\n", ""))
            seqplast =
""
           
numlines += 1
       
else:
            seqplast = seqplast + i.replace(
"\n", "")
            numlines +=
1
       
if numlines == len(lines):
            seq.append(seqplast.replace(
"\n", ""))
    seq = seq[
1:]
   
return index, seq

 

 

f = open('rosalind_tran.txt', 'r')
lines = f.readlines()
f.close()

[index, seq] = readfasta(lines)
s1 = seq[
0]
s2 = seq[
1]

i =
0
ti = 0 # 记录转换
tv = 0 # 记录颠换
while i < len(s1):
   
if (s1[i] == 'A' and s2[i] == 'G') or (s1[i] == 'G' and s2[i] == 'A') or (s1[i] == 'C' and s2[i] == 'T') or (s1[i] == 'T' and s2[i] == 'C'):
        ti = ti +
1
   
elif ((s1[i] == 'A' and s2[i] == 'T') or (s1[i] == 'A' and s2[i] == 'C') or (s1[i] == 'G' and s2[i] == 'T') or (s1[i] == 'G' and s2[i] == 'C') or
       
(s1[i] == 'C' and s2[i] == 'G') or (s1[i] == 'C' and s2[i] == 'A') or (s1[i] == 'T' and s2[i] == 'G') or (s1[i] == 'T' and s2[i] == 'A')):
        tv = tv +
1
   
i += 1
per = ti / tv
print(round(per, 11))

 


【ROSALIND】【练Python,学生信】31 转换与颠换的评论 (共 条)

分享到微博请遵守国家法律