在pandas和python中,在特殊条件下将dataframe中的数据转换为列表

2024-05-19 00:06:31 发布

您现在位置:Python中文网/ 问答频道 /正文

以下是数据帧的子集。 我想创建4个列表:

list 1: list of all WD1 as follows:
[flu-like symptoms, dizziness, major mood swings, lots of anxiety, tiredness,  Dizziness, headaches, neck pain, headache, nausea] 
list 2: comment_id: [1, 1, 1, 1, 1, 14, 14, 14, 17, 17]
list 3 drug_id [lex.1, lex.1,  lex.1, lex.1, lex.1,  lex14, lex14, lex14, lex18, lex18]

如您所见,如果WD列中的值不是NAN,我将捕获该值的注释id和药品id

enter image description here

我知道我可以使用以下代码迭代行以捕获每个WD:

for index, row in df.iterrows()

但我不知道怎么说if it is not NAN 而且,当我将捕获的值添加到已经定义的列表中时,该列表不会返回列表。列表中的数据是字符串格式


Tags: of数据id列表asnanall子集
1条回答
网友
1楼 · 发布于 2024-05-19 00:06:31

您可以通过cumcount()创建一个rowid,该rowid对应于comment_iddrug_id的每个组合中的列索引,然后用两个id列作为索引将其取消堆叠:

df1 = (df.assign(rowid = df.groupby(["comment_id", "drug_id"]).cumcount() + 1)
       .set_index(["comment_id", "drug_id", "rowid"])
       .rename_axis(("comment_id", "drug_id","")).unstack(level=2))

# rename columns from multi-index to single index
df1.columns = [''.join(map(str, col)) for col in df1.columns]
df1.reset_index()

enter image description here


数据设置:

WDs = ["flu-like symptoms", "dizziness", "major mood swings", "lots of anxiety", "tiredness",  "Dizziness", "headaches", "neck pain", "headache", "nausea"] 
comment_id = [1, 1, 1, 1, 1, 14, 14, 14, 17, 17]
drug_id = ["lex.1", "lex.1",  "lex.1", "lex.1", "lex.1",  "lex14", "lex14", "lex14", "lex18", "lex18"]

df = pd.DataFrame({"WD": WDs, "comment_id": comment_id, "drug_id": drug_id})

更新:

看起来您想要相反的结果,给定数据帧df1,您可以首先将其转换为长格式,然后每个列都是您需要的,您可以使用tolist()来转换它们:

df2 = df1.set_index(["comment_id", "drug_id"]).stack().rename("WD").reset_index()   
comment_id, drug_id, WD = df2.comment_id.tolist(), df2.drug_id.tolist(), df2.WD.tolist()

相关问题 更多 >

    热门问题