如何使用前一列的两列中的键创建新的数据帧?

2024-04-23 15:12:51 发布

您现在位置:Python中文网/ 问答频道 /正文

具体来说,我正在使用quoracsv文件,我将它加载到一个pandas数据帧中,结构如下所示。你知道吗

------------------------------------------------------------------------
id| qid1| qid2| question1text  |question2text             |is_duplicate
------------------------------------------------------------------------

01|00001|00002|Why do we exist?| Is there life on Mars?   | 0
02|00001|00003|Why do we exist?| What happens after death?| 0

因此,我想将所有不同的问题及其问题id和相应的文本传递到一个新的数据框中,该数据框有两列,只有问题id和相应的问题文本,如下所示:

---------------------------
qid  |questiontext          |
---------------------------

00001|Why do we exist?
00002|Is there life on Mars?
00003|What happens after death?

Tags: 数据idisondowhatexistwe
1条回答
网友
1楼 · 发布于 2024-04-23 15:12:51

调整columns后使用wide_to_long

df.columns=df.columns.str.replace('text','')
newdf=pd.wide_to_long(df,['qid','question'],i=['id'],j='drop')
newdf
         is_duplicate  qid                    question
id drop                                               
1  1                0    1            Why do we exist?
2  1                0    1            Why do we exist?
1  2                0    2   Is there life on Mars?   
2  2                0    3   What happens after death?

那我们需要drop_duplicates

newdf=newdf.drop_duplicates(['qid','question'])[['qid','question']]
newdf
         qid                    question
id drop                                 
1  1       1            Why do we exist?
   2       2   Is there life on Mars?   
2  2       3   What happens after death?

相关问题 更多 >