将两个文本文件合并到一个新的文本文件中，并将它们汇总成一个新的文本文件

chr1,105000000,105310000,2,1,3,2 chr1,5310000,5960000,2,1,5,4 chr1,1580000,1180000,4,1,5,3 chr19,107180000,107680000,1,1,5,4 chr1,7680000,8300000,3,1,1,2 chr1,109220000,110070000,4,2,3,3 chr1,11060000,12070000,6,2,7,4

AKAP8L chr19 107180100 107650000 transcript AKAP8L chr19 15514130 15529799 transcript AKIRIN2 chr6 88384790 88411927 transcript AKIRIN2 chr6 88410228 88411243 transcript AKT3 chr1 105002000 105010000 transcript AKT3 chr1 243663021 244006886 transcript AKT3 chr1 243665065 244013430 transcript

1- the 1st column in file first is equal to 2nd column in file second. 2- the 3rd column in the file second is greater than the the 2nd column in the file first and also smaller than the 3rd column in the file first. 3- the 4th column in the file second should be also greater than the the 2nd column in the file first and also smaller than the 3rd column in the file first.

first = open('first.csv', 'rb') second = open('second.txt', 'rb') first_file = [] for line in first: first_file.append(line.split(',')) second_file = [] for line2 in second: second_file.append(line.split()) count=0 final = [] for i in range(len(first_file)): for j in range(len(second_file)): first_row = first_file[i] second_row = second_file[j] first_col = first_row.split() second_col = second_row.split() if first_col[0] == second_col[1] and first_col[1] < second_col[2] < first_col[2] and first_col[1] < second_col[3] < first_col[2] count+=1 final.append(first_col[i]+second_col[0]+count)

2条回答

网友

1楼 · 编辑于 2024-04-23 18:58:50

考虑到您没有列名，这看起来非常健壮，但它可以工作并使用pandas：

import pandas as pd

first = 'first.csv'
second = 'second.txt'

df1 = pd.read_csv(first, header=None)
df2 = pd.read_csv(second, sep='\s+', header=None)

merged = df1.merge(df2, left_on=[0], right_on=[1], suffixes=('first', 'second'))
a, b, c, d = merged['2second'], merged['1first'], merged['2first'], merged['3second']

cleaned = merged[(c>a)&(a>b)&(c>d)&(d>b)]

counted = cleaned.groupby(['0first', '1first', '2first', '3first', '4first', 5, 6, '0second'])['4second'].count().reset_index()

counted.to_csv('result.csv', index=False, header=False)

这将生成具有以下内容的result.csv：

chr1,105000000,105310000,2,1,3,2,AKT3,1
chr19,107180000,107680000,1,1,5,4,AKAP8L,1

网友

2楼 · 编辑于 2024-04-23 18:58:50

在你相同的设置下，如果你按下面的操作，它会工作。你知道吗

first = open('first.csv', 'r')
second = open('second.txt', 'r')
first_file = []
for line in first:
    first_file.append(line.strip())
second_file = []
for line2 in second:
    second_file.append(line2)
count=0
final = []
for i in range(len(first_file)):
    for j in range(len(second_file)):
        first_row = first_file[i]
        second_row = second_file[j]
        first_col = first_row.split(',')
        second_col = second_row.split()
        if (first_col[0] == second_col[1]) and (first_col[1] < second_col[2] < first_col[2]) and (first_col[1] < second_col[3] < first_col[2]):
            count = count + 1
            final.append(first_row +','+second_col[0]+',' + str(count))
print(final)

这将产生你想要的结果。你知道吗

['chr1,105000000,105310000,2,1,3,2,AKT3,1', 'chr19,107180000,107680000,1,1,5,4,AKAP8L,2']

相关问题更多 >

编程相关推荐

热门问题

热门文章