合并文件但只输出头部

2024-06-10 11:33:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我以前看到过一些帖子,其中有一些对其他人有用的解决方案,但由于某些原因,这些帖子对我不起作用。你知道吗

我试图编写一个python脚本来1)合并三个具有相同格式的文件,2)只删除重复的头,3)按Specimen_ID对行排序,4)在每个唯一的Specimen_ID之间添加两行新的空行(即,每三行添加一行,除了第一个实例由于头的原因需要是前4行)。你知道吗

我有一部分脚本可以用于前两步和最后一步:

import glob

read_files = glob.glob("*.txt")

header_saved = False
linecnt=0
with open("merged_data.txt", "wb") as outfile:
    for f in read_files:
        with open(f, "rb") as infile:
            header = next(infile)
            if not header_saved:
                outfile.write(header)
                header_saved = True
            for line in infile:
                outfile.write(line)
                linecnt=linecnt+1
                if (linecnt%3)==0:
                    outfile.write("\n\n")

对这些行的排序有什么建议吗?另外,如果数据以制表符分隔的txt文件的形式从Excel中导出,我发现这个脚本只会生成包含第一个内嵌内容的输出,而不包含其他内容。如果我只是复制并粘贴到一个新的txt文件中的数据,并使用这些作为填充,我没有问题。有人知道我为什么会遇到这个问题吗?你知道吗

输入文件文本示例(内嵌1):

Specimen_ID Measured_by_initals Measure_date    Sex Beak_length Pronotal_width  Right_fore_femur_length Right_fore_femur_width  Left_fore_femur_length  Left_fore_femur_width   Right_hind_femur_length Right_hind_femur_width  Left_hind_femur_length  Left_hind_femur_width   Right_hind_femur_area   Left_hind_femur_area    Right_hind_tibia_width  Left_hind_tibia_width   Notes
a   1   30-Dec-16   M   4   4   4   4   4   4   4   4   4   4   4   4   4   4   
b   1   30-Dec-16   F   4   4   4   4   4   4   4   4   4   4   4   4   4   4   beak bent
c   1   30-Dec-16   M   4   4   4   4   4   4   4   4   4   4   4   4   4   4   
d   1   30-Dec-16   F   4   4   4   4   4   4   4   4   4   4   4   4   4   4   
e   1   30-Dec-16   F   4   4   4   4   4   4   4   4   4   4   4   4   4   4   pronotum deformed
f   1   30-Dec-16   F   4   4   4   4   4   4   4   4   4   4   4   4   4   4   

输入文件文本示例(内嵌2):

Specimen_ID Measured_by_initals Measure_date    Sex Beak_length Pronotal_width  Right_fore_femur_length Right_fore_femur_width  Left_fore_femur_length  Left_fore_femur_width   Right_hind_femur_length Right_hind_femur_width  Left_hind_femur_length  Left_hind_femur_width   Right_hind_femur_area   Left_hind_femur_area    Right_hind_tibia_width  Left_hind_tibia_width   Notes
a   2   30-Dec-16   M   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
b   2   30-Dec-16   F   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
c   2   30-Dec-16   M   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
d   2   30-Dec-16   F   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
e   2   30-Dec-16   F   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
f   2   30-Dec-16   F   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 

Tags: 文件righttxtidareawidthleftlength
1条回答
网友
1楼 · 发布于 2024-06-10 11:33:32

除非文件中有一些意外的数据,否则您的解决方案应该是完美的。我刚为你的第三个项目添加了代码

read_files = glob.glob("*.txt")

header_saved = False
linecnt=0
with open("merged_data.txt", "wb") as outfile:
    for f in read_files:
        with open(f, "rb") as infile:
            header = next(infile)
            if not header_saved:
                outfile.write(header)
                header_saved = True
            for line in infile:
                outfile.write(line)
                linecnt=linecnt+1
                if (linecnt%3)==0:
                    outfile.write("\n\n")

输入文件1.txt

Employee,Account,Currency,Amount,Location
Test 1,  Basic,USD,3000,Airport
Test 2,  Net, USD,2000,Airport
Test 3,  Basic,USD,4000,Town
Test 4,  Net, USD,3000,Town
Test 5,  Basic,GBP,5000,Town
Test 6,  Net, GBP,4000,Town

输入文件2.txt

Employee,Account,Currency,Amount,Location
Test 8,  Basic,USD,3000,Airport
Test 9,  Net, USD,2000,Airport
Test 10,  Basic,USD,4000,Town
Test 11,  Net, USD,3000,Town
Test 12,  Basic,GBP,5000,Town
Test 13,  Net, GBP,4000,Town

输出

Employee,Account,Currency,Amount,Location
Test 1,  Basic,USD,3000,Airport
Test 2,  Net, USD,2000,Airport
Test 3,  Basic,USD,4000,Town


Test 4,  Net, USD,3000,Town
Test 5,  Basic,GBP,5000,Town
Test 6,  Net, GBP,4000,Town

Test 8,  Basic,USD,3000,Airport
Test 9,  Net, USD,2000,Airport
Test 10,  Basic,USD,4000,Town


Test 11,  Net, USD,3000,Town
Test 12,  Basic,GBP,5000,Town
Test 13,  Net, GBP,4000,Town

相关问题 更多 >