列中的行?

2024-05-28 19:54:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据帧:

Index Uniprot    ID1   ID2      P1       P2      
1     O00141     2r5tA 3hdmA    2r5tA_1  3hdmA_9
2     O00141     2r5tA 3hdmA    2r5tA_2  3hdmA_1
3     O00141     2r5tA 3hdmA    2r5tA_7  3hdmA_7
4     O15021     2w7rB 2w7rA    2w7rB_2  2w7rA_2

希望输出是这样的:

O00141 2r5tA 2r5tA_1 2r5tA_2 2r5tA_7
O00141 3hdmA 3hdmA_9 3hdmA_1 3hdmA_7              
O15021 2w7rB 2w7rB_2
O15021 2w7rA 2w7rA_2

我在pandas.T中使用了转置,得到了一些类似的输出,但只得到了线性行:

Uniprot  O00141   O00141   O00141   O15021
ID1      2r5tA    2r5tA    2r5tA    2w7rB
ID2      3hdmA    3hdmA    3hdmA    2w7rA
P1       2r5tA_1  2r5tA_2  2r5tA_7  2w7rB_2
P2       3hdmA_9  3hdmA_1  3hdmA_7  2w7rA_2

Tags: 数据pandasindex线性uniprotid2p2p1
1条回答
网友
1楼 · 发布于 2024-05-28 19:54:11

需要迭代每一行,但不复杂,其思想是用所需的数据创建dict并使用DataFrame.from_dict

data="""
Index Uniprot   P1       P2       ID1     ID2   
1     O00141    2r5tA_1  3hdmA_9  2r5tA 3hdmA
2     O00141    2r5tA_2  3hdmA_1  2r5tA 3hdmA
3     O00141    2r5tA_7  3hdmA_7  2r5tA 3hdmA
4     O15021    2w7rB_2  2w7rA_2  2w7rB 2w7rA
"""
#create the sample dataframe
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')

#Uniprot have to be sorted 
df = df.sort_values(by= 'Uniprot')
dico = {}
for i, row in df.iterrows():
    key1 = row.Uniprot + 'C1';key2 = row.Uniprot + 'C2'
    if key1 not in dico:
        dico[key1] = [row.Uniprot, row.ID1, row.P1]
        dico[key2] = [row.Uniprot, row.ID2, row.P2]
    else:
        dico[key1] = dico[key1] + [row.P1]
        dico[key2] = dico[key2] + [row.P2]

maxlen = max ([len(l) for l in dico.values()])
for k in dico.keys():
    d = len(dico[k])
    dico[k] = dico[k] + [''] * (maxlen - len(dico[k]))

df_result = pd.DataFrame.from_dict(dico).T.reset_index(drop=True)
print(df_result)

输出:

        0      1        2        3        4
0  O00141  2r5tA  2r5tA_1  2r5tA_2  2r5tA_7
1  O00141  3hdmA  3hdmA_9  3hdmA_1  3hdmA_7
2  O15021  2w7rB  2w7rB_2                  
3  O15021  2w7rA  2w7rA_2 

相关问题 更多 >

    热门问题