第2列中按“;”拆分的搜索值

2024-05-16 08:31:04 发布

您现在位置:Python中文网/ 问答频道 /正文

在文件2的第2列(包含许多由";"分隔的值的PDBid)中从文件1搜索第2列的值。如果找到,则合并文件1和文件2的所有列

文件1:

peptide,PDB_ID
pep1,4BAK
pep1,4BAH
pep1,7R1R
pep1,6R1R
pep1,5R1R
pep1,4R1R
pep1,3R1R
pep1,4CH8
pep1,4CH2
pep1,1DN2
pep1,2NNU
pep1,3DIW
pep1,2G56
pep1,2G54
pep1,1TVB
pep1,2C9F
pep1,1JK8
pep1,2P1L
pep1,4IPZ
pep1,4HPY
pep1,4HPO
pep1,4JJM

文件2:

Uniprotid,PDBid,Genesymbol,entryname
P00452,1QFN,1R1R;1RLR;2R1R;2X0X;2XAK;2XAP;2XAV;2XAW;2XAX;2XAY;2XAZ;2XO4;2XO5;3R1R;3UUS;4ERM;4ERP;4R1R;5R1R;6R1R;7R1R,nrdA dnaF b2234 JW2228,RIR1_ECOLI
P69924,6R1R,nrdB ftsB b2235 JW2229,RIR2_ECOLI
P03120,1BY9;1DTO;1R8P;1ZZF;2NNU;2Q79;3MI7,E2,VE2_HPV16
Q96HN0,2NNU,Homo sapiens (Human),Q96HN0_HUMAN
Q9YIV0,2NNU,E2,Q9YIV0_HPV16
Q9DBG9,3DIW;3DJ1;3DJ3,Tax1bp3,TX1B3_MOUSE
Q6N089,3CFJ;3CFK;4HPY,DKFZp686P15220,Q6N089_HUMAN
Q8N5F4,4HPY,IGL@,Q8N5F4_HUMAN
G9HS63,4HPO;4HPY,env,G9HS63_9HIV1
P00734,3E6P;3EE0;3EGK;3EQ0;3F68;3GIC;3GIS;3HAT;3HKJ;3HTC;3JZ1;3JZ2;3K65;3LDX;3LU9;3NXP;3P17;3P6Z;3P70;3PMH;3PO1;3QDZ;3QGN;3QLP;3QTO;3QTV;3QWC;3QX5;3R3G;3RLW;3RLY;3RM0;3RM2;3RML;3RMM;3RMN;3RMO;3S7H;3S7K;3SHA;3SHC;3SI3;3SI4;3SQE;3SQH;3SV2;3T5F;3TU7;3U69;3U8O;3U8R;3U8T;3U98;3U9A;3UTU;3UWJ;3VXE;3VXF;4AX9;4AYV;4AYY;4AZ2;4BAH;4BAK;4BAM;4BAN;4BAO;4BAQ;4BOH;4CH2;4CH8;4DIH,F2,THRB_HUMAN

File2可能包含File1中列出的多个值,因为需要考虑最后一行中存在4BAK和4bah的情况,并将其粘贴到输出文件中。你知道吗

输出文件示例:

peptide,PDB_ID,Uniprotid,Genesymbol,entryname
pep1,4BAK,P00734,F2,THRB_HUMAN  
pep1,4BAH,P00734,F2,THRB_HUMAN   
pep1,7R1R,P00452,nrdA dnaF b2234 JW2228,RIR1_ECOLI  
pep1,6R1R,P00452,nrdA dnaF b2234 JW2228,RIR1_ECOLI   
pep1,6R1R,P69924,nrdB ftsB b2235 JW2229,RIR2_ECOLI   
pep1,5R1R,P00452,nrdA dnaF b2234 JW2228,RIR1_ECOLI  

Tags: 文件f2humanpeptidepdbidp00734ecolithrb
2条回答

试试这个:

awk -F"," 'NR==FNR{split($2, b, ";"); for(i in b) {a[b[i]]=$1","$3","$4;} next}  {print $0","a[$2]}' file2 file1

awk -F"," '
    NR==FNR{split($2, b, ";");  #in file2
       for(i in b) {            #make a map
         a[b[i]]=$1","$3","$4;
       } 
    next}  
   {print $0","a[$2]}' file2 file1 #in file1, search map and print

使用csv阅读器使得csv文件的解析非常简单。我想你可能会在第二个领域有一个以上的价值绊倒-不要试图一下子做事情。例如python3。你知道吗

import csv
from collections import defaultdict

d = defaultdict(list)

fonereader = csv.reader(open('file1'), delimiter=',')
for row in fonereader:
    d[row[1]].append(row[0])

ftworeader = csv.reader(open('file2'), delimiter=',')
for row in ftworeader:
    for id in row[1].split(';'):
        if id in d:
            d[id].append(row[0])
            d[id].extend(row[2:])

for k in d:
    if len(d[k]) > 2:
        print(d[k][0], k, *d[k][2:])

如果首先将file1读入字典,将第一个字段作为值添加到field2的键中,则可以轻松地合并到第二个文件中。请注意,file2的第二个字段被读取为一个字符串,因此必须手动拆分和迭代它。然后,如果id匹配,则将第一个字段和所有其他字段附加到file1中的每个匹配id。你知道吗

相关问题 更多 >