将制表符打印合并到具有生成prin的id的数据框中

2024-05-15 09:55:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用一个python函数,它为数据帧中的每个I返回一个选项卡格式。以下是一个例子:

以下是我用于为每次打印生成选项卡格式的代码:

for i in df1['col1']:
    print(u.search(i,frmt="tab",columns=("lineage-id,id,go, go(biological process), go(molecular function),go(cellular component), go-id,reviewed"))

结果是:

Taxonomic lineage IDs   Entry   Gene ontology (GO)  Gene ontology (biological process)  Gene ontology (molecular function)  Gene ontology (cellular component)  Gene ontology IDs   Status
    619591  Q8V552  extracellular space [GO:0005615]            extracellular space [GO:0005615]    GO:0005615  unreviewed

Taxonomic lineage IDs   Entry   Gene ontology (GO)  Gene ontology (biological process)  Gene ontology (molecular function)  Gene ontology (cellular component)  Gene ontology IDs   Status
878992  Q8G553  extracellular space [GO:0005616]        golgi   extracellular space [GO:0005615]    GO:0005616  reviewed

Taxonomic lineage IDs   Entry   Gene ontology (GO)  Gene ontology (biological process)  Gene ontology (molecular function)  Gene ontology (cellular component)  Gene ontology IDs   Status
5672    Q89554  extracellular space [GO:0005617]        golgi   extracellular space [GO:0005615]    GO:0005617  reviewed

(如您所见,共有8个colname,其中一些名称中带有空格,还有一些列没有任何信息。您还可以注意到,Num_009418726.1没有生成打印,因为这一次没有结果)

新名称为:

Taxonomic lineage IDs
Entry
Gene ontology (GO)
Gene ontology (biological process)
Gene ontology (molecular function)
Gene ontology (cellular component)
Gene ontology IDs
Status

df1['col1']由以下ID组成:

NUm_009468701.1
Num_009418725.1
Num_009418726.1
Num_009429300.1

想法是将这3个标签与df1['col1']中的相应ID合并到df1中:

并在最后得到:

col1    Taxonomic lineage IDs   Entry   Gene ontology (GO)  Gene ontology (biological process)  Gene ontology (molecular function)  Gene ontology (cellular component)  Gene ontology IDs   Status
Num_009468701.1 619591  Q8V552  extracellular space [GO:0005615]    NA  NA  extracellular space [GO:0005615]    GO:0005615  unreviewed
Num_009418725.1 878992  Q8G553  extracellular space [GO:0005616]    NA  golgi   extracellular space [GO:0005615]    GO:0005616  reviewed
Num_009418726.1 NA  NA  NA  NA  NA  NA  NA  NA
Num_009429300.1 5672    Q89554  extracellular space [GO:0005617]    NA  golgi   extracellular space [GO:0005615]    GO:0005617  reviewed

谢谢你抽出时间


Tags: idsgofunctionspaceprocessnumcomponentlineage
1条回答
网友
1楼 · 发布于 2024-05-15 09:55:22

您可以输出函数来创建列表列表

base_list = []
//I am using "..." to indicate "etc." - it is not part of the syntax
for i in df1['col1']:
    if u.search(...):
       base_list.append([i, *u.search(...).split("\t")])

然后从以下内容创建一个数据帧-

import pandas as pd
df = pd.DataFrame(base_list, columns=['col1', ...])
df.set_index('col1', inplace=True)//Set Col1 as your index
df.dropna(how='all')//or you can use how='any' depending on your need

相关问题 更多 >