删除两个文件中不常见的字符串

2024-05-15 23:00:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个文件,文件1包含2列,文件2包含5列。 我想从文件2中删除不包含与文件1相同的字符串的行:

-文件1,如果这是一个列表,则每行包含[0]和[1]

gene-3  +
gene-2  -
gene-1  -

-文件2,将文件1中的[0]和[1]与此文件的[0]和[4]进行比较。如果file1中的noline与file2的任何行匹配,则必须删除。你知道吗

gene-1  mga CDF 1   +  # this line contains + instead - although gane-1 is the same. rm
gene-2  mga CDS 1   -  # [0][1] from file 1 = [0][4] from file 2: (gene-2, - ) keep it!
gene-3  mga CDH 1   +  # ""                 ""              ""
gene-4  mga CDS 1   +  # no gene-4 in file 1, remove.

-期望输出:

gene-3  mga CDH 1   +
gene-2  mga CDS 1   -

有什么想法吗?你知道吗


Tags: 文件字符串from列表linethisfile1file2
2条回答
with open("file1.txt") as f, open("file2.txt") as f1:
    items  = set(line.rstrip() for line in f)
    filtered = [line for line in f1 if "  ".join(line.split()[::4]) in items]
    with open("file2.txt","w") as f3:
        f3.writelines(filtered)
with open('file1', 'r') as f:
    keepers = set(tuple(line.split()) for line in f)
with open('file2', 'r') as f_in, open('file3', 'w') as f_out:
    for line in f_in:
        parts = line.split()
        if (parts[0], parts[-1]) in keepers:
            f_out.write(line)

相关问题 更多 >