在python中删除文件中的多余行

2024-04-19 12:27:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个8列的文本文件。第一个是ID,第八个是type。在第一列中,每个ID有许多重复的行,但是在第八列中,每个ID有许多类型,一个类型是H,每个ID只有一个H

ID    type
E0    B
E0    H
E0    S
B4    B
B4    H

我想制作另一个文件,其中每个ID只有一行(只有第8列有H的行)。这个例子是这样的:

ID    type
E0    H
B4    H

Tags: 文件id类型type例子b4文本文件e0
2条回答

刚刚更新了针对python2.7.3的inspectorG4dget解决方案:
只考虑输入csv文件中的两列,它们是由IDtype分隔的\t

代码:

import csv

with open('/home/vivek/Desktop/input.csv', 'rb') as infile, open('/home/vivek/Desktop/output.csv', 'wb') as outfile:
    reader = csv.reader(infile, delimiter='\t')
    writer = csv.writer(outfile, delimiter='\t')
    reader_row = next(reader)
    writer.writerow([reader_row[0], reader_row[1]])
    for row in reader:
        if row[1]=="H":
            writer.writerow(row)

输出:

ID      type
E0      H
B4      H

检查下面的2.6.6我没有测试下面的python2.6.6代码,因为我的机器上有python2.7.3。你知道吗

with open('/home/vivek/Desktop/input.csv', 'rb') as infile:
    with open('/home/vivek/Desktop/output.csv', 'wb') as outfile:
        reader = csv.reader(infile, delimiter='\t')
        writer = csv.writer(outfile, delimiter='\t')
        reader_row = next(reader)
        writer.writerow([reader_row[0], reader_row[1]])
        for row in reader:
            if row[1]=="H":
                writer.writerow(row)

假设您的文件只是一个文本文件,用空格/制表符分隔列,并且包含“type”的列正好位于行的末尾:

with open('input.txt', 'r') as input_file:
    input_lines = input_file.readlines()

# Take the header line, and all the subsequent lines whose last character is 'H'
output_lines = input_lines[:1] + [line for line in input_lines if line[-2] == 'H'] 

output_string = ''.join(output_lines)
with open('output.txt', 'w') as output_file:
    output_file.write(output_string)

上面的代码假设“type”列在单字符类型代码之后立即结束。如果数据后面可能有空格,或者可能有类似“AH”等的多字符类型代码,则将注释下面的行替换为以下行:

output_lines = input_lines[:1] + [line for line in input_lines if line.split()[-1] == 'H'] 

编辑:如果您的文件非常大,并且您不想将其全部加载到内存中并进行操作,则可以使用生成器表达式,该表达式将被延迟计算:

with open('input.txt', 'r') as input_file:
    output_lines = (line for i, line in enumerate(input_lines)
                    if line[-2] == 'H' or i == 0) 
    with open('output.txt', 'w') as output_file:
        for line in output_lines:
            output_file.write(line)

相关问题 更多 >