【ROSALIND】【练Python,学生信】31 转换与颠换

如果第一次阅读本系列文档请先移步阅读【ROSALIND】【练Python,学生信】00 写在前面 谢谢配合~

题目:
转换与颠换(Transitions and Transversions)
Given: Two DNA strings s1 and s2 of equal length (at most 1 kbp).
所给:两条不超过1kb长的DNA序列s1和s2。
Return: The transition/transversion ratio R(s1,s2).
需得:转换与颠换频率的比值R(s1,s2)。
测试数据
>Rosalind_0209
GCAACGCACAACGAAAACCCTTAGGGACTGGATTATTTCGTGATCGTTGTAGTTATTGGA
AGTACGGGCATCAACCCAGTT
>Rosalind_2200
TTATCTGACAAAGAAAGCCGTCAACGGCTGGATAATTTCGCGATCGTGCTGGTTACTGGC
GGTACGAGTGTTCCTTTGGGT
测试输出
1.21428571429
生物学背景
点突变包括两种类型:转换(transition)和颠换(transversion)。转换是嘌呤与嘌呤,或嘧啶与嘧啶之间的替换,即A与G,T与C之间的替换;颠换则是嘌呤与嘧啶之间的替换。简单来说,转换不改变碱基的种类,颠换会改变。如下图:

因为颠换的改变更为剧烈,所以发生的频率更低。在基因组中,转换与颠换频率的比值约为2。在蛋白编码区,这个比值可以超过3,因为相对于颠换,转换不容易改变密码子编码的氨基酸。也因为这个原因,转换与颠换频率的比值可以帮我们鉴定蛋白编码区。
数学背景
将序列发生转换与颠换的次数相比则得到所求比值。
思路
本题思路很简单,只需比较两序列,记录转换和颠换的次数,相比即可。
代码
def readfasta(lines):
"""读入fasta格式文件的函数"""
seq = []
index = []
seqplast = ""
numlines = 0
for i in lines:
if '>' in i:
index.append(i.replace("\n", "").replace(">", ""))
seq.append(seqplast.replace("\n", ""))
seqplast = ""
numlines += 1
else:
seqplast = seqplast + i.replace("\n", "")
numlines += 1
if numlines == len(lines):
seq.append(seqplast.replace("\n", ""))
seq = seq[1:]
return index, seq
f = open('rosalind_tran.txt', 'r')
lines = f.readlines()
f.close()
[index, seq] = readfasta(lines)
s1 = seq[0]
s2 = seq[1]
i = 0
ti = 0 # 记录转换
tv = 0 # 记录颠换
while i < len(s1):
if (s1[i] == 'A' and s2[i] == 'G') or (s1[i] == 'G' and s2[i] == 'A') or (s1[i] == 'C' and s2[i] == 'T') or (s1[i] == 'T' and s2[i] == 'C'):
ti = ti + 1
elif ((s1[i] == 'A' and s2[i] == 'T') or (s1[i] == 'A' and s2[i] == 'C') or (s1[i] == 'G' and s2[i] == 'T') or (s1[i] == 'G' and s2[i] == 'C') or
(s1[i] == 'C' and s2[i] == 'G') or (s1[i] == 'C' and s2[i] == 'A') or (s1[i] == 'T' and s2[i] == 'G') or (s1[i] == 'T' and s2[i] == 'A')):
tv = tv + 1
i += 1
per = ti / tv
print(round(per, 11))