如果两个文件中的值与Python匹配，则合并行

filenames =['file1.txt', 'file2.txt'] with open('file3.txt', 'w') as collated: with open('1.txt', 'r') as genes: with open('2.txt', 'r') as counts: if '#query_name' in genes == 'Geneid' in counts: for line1, line2 in zip(genes, counts): print(line1.strip(), line2.strip(), file=collated)

1条回答

网友

1楼 · 发布于 2024-05-19 00:04:09

以下是使用熊猫的解决方案：

输入：

df1 = pd.read_csv('file1.txt', sep='\t')
df2 = pd.read_csv('file2.txt', sep='\t')
merged_df = df1.merge(df2, left_on='#query_name' , right_on='Geneid' , how='inner').drop(['#query_name'],axis=1)
merged_df.to_csv('output.csv', index=False)

合并数据框的输出：

  KEGG_KOs        Geneid           Chr  Count
0   K00240  PROKKA_00019  k141_1000050    102
1   K00246  PROKKA_00020  k141_1000050    132

第2行和第3行只是读入txt文件（我假设它们是制表符分隔的）并将它们保存为数据帧（df1和df2）。在第4行，我使用query name和Geneid列合并df1和df2，然后删除query name列。我将输出保存为csv，然后删除索引（0，1）。如果要将合并的数据帧保存为制表符分隔的文件，只需将最后一行更改为：merged_df.to_csv('output.txt', sep='\t', index=False)

如果你得到一个keyerror，那一定意味着你的文件格式可能有点不稳定（有空格和制表符的混合）。此代码应适用于：

输入：

import pandas as pd
def to_df(file):
    with open(file) as f:
        df = [line.strip().split() for line in f]
    return pd.DataFrame(df[1:], columns=df[0])
df1 = to_df('file1.txt')
df2 = to_df('file2.txt')
merged_df = df1.merge(df2, left_on='#query_name' , right_on='Geneid' , how='inner').drop(['#query_name'],axis=1)
merged_df.to_csv('output.csv', index=False)

输出：

  KEGG_KOs        Geneid           Chr Count
0   K00240  PROKKA_00019  k141_1000050   102
1   K00246  PROKKA_00020  k141_1000050   132

相关问题更多 >

编程相关推荐

热门问题

热门文章