我有两个文本文件像下面的例子。我将其中一个命名为first
(comma separated
),另一个命名为second
(tab separated
)。你知道吗
first
:
chr1,105000000,105310000,2,1,3,2
chr1,5310000,5960000,2,1,5,4
chr1,1580000,1180000,4,1,5,3
chr19,107180000,107680000,1,1,5,4
chr1,7680000,8300000,3,1,1,2
chr1,109220000,110070000,4,2,3,3
chr1,11060000,12070000,6,2,7,4
second
:
AKAP8L chr19 107180100 107650000 transcript
AKAP8L chr19 15514130 15529799 transcript
AKIRIN2 chr6 88384790 88411927 transcript
AKIRIN2 chr6 88410228 88411243 transcript
AKT3 chr1 105002000 105010000 transcript
AKT3 chr1 243663021 244006886 transcript
AKT3 chr1 243665065 244013430 transcript
在第一个文件中,2
和3
列是start和end。在第二个文件列中,3
和4
分别是start和end。我想从第一个和第二个文件中创建一个新的文本文件。
在新文件中,我想根据以下条件(3列)计算file second
中与file first
中的每一行匹配的行数:
1- the 1st column in file first is equal to 2nd column in file second.
2- the 3rd column in the file second is greater than the the 2nd column in the file first and also smaller than the 3rd column in the file first.
3- the 4th column in the file second should be also greater than the the 2nd column in the file first and also smaller than the 3rd column in the file first.
在act中,输出看起来像预期的输出。前7列直接来自file first
,但第9列是file second
中匹配file first
中每一行的行数(基于上述3个标准)。而8th column
将是“来自file second
的行的第一列,它首先匹配文件的特定行”
expected output
:
chr19,107180000,107680000,1,1,5,4,AKAP8L, 1
chr1,105000000,105310000,2,1,3,2, AKT3, 1
我正试图用python实现这一点,并编写了这段代码,但它并没有返回我想要的内容。你知道吗
first = open('first.csv', 'rb')
second = open('second.txt', 'rb')
first_file = []
for line in first:
first_file.append(line.split(','))
second_file = []
for line2 in second:
second_file.append(line.split())
count=0
final = []
for i in range(len(first_file)):
for j in range(len(second_file)):
first_row = first_file[i]
second_row = second_file[j]
first_col = first_row.split()
second_col = second_row.split()
if first_col[0] == second_col[1] and first_col[1] < second_col[2] < first_col[2] and first_col[1] < second_col[3] < first_col[2]
count+=1
final.append(first_col[i]+second_col[0]+count)
考虑到您没有列名,这看起来非常健壮,但它可以工作并使用
pandas
:这将生成具有以下内容的
result.csv
:在你相同的设置下,如果你按下面的操作,它会工作。你知道吗
这将产生你想要的结果。你知道吗
相关问题 更多 >
编程相关推荐