将列传输到具有分解单元和重复标题名的新df

2024-05-15 13:06:37 发布

男 | 程序猿一只，喜欢编程写python代码。

我有一个期刊标题和纪元主题（该期刊的学科）的数据框架，它不整洁，在era_subjects系列的同一单元格中包含多个值

df = pd.DataFrame({
'title':['Veterinary pathology', 'Clothing and textiles research journal'],
'era_subjects':["[['07', 'Agricultural and Veterinary Sciences'], ['04', 'Fisheries Sciences'], ['0707', 'Veterinary Sciences']]","[['1203', 'Design Practice and Management'], ['12', 'Built Environment and Design']]"],
'cpu_rank': ['1', '2'],
'subscribed': ['True', 'False'],
'downloads': ['800', '550']})

我编写了一个函数，只从era_subjects中提取并返回最宽级别的两位数主题字符串（可以有多个）。例如，我在row 0上的函数的结果是一个包含以下内容的单元格：

['Agricultural and Veterinary Sciences', 'Fisheries Sciences']

然后，我使用媒体文章here中概述的技术将生成的单元格分解为一个新的_df，在必要时重复多行日志名称：

现在，我想用原始的df中的信息来补充这个新的，例如subscribed作为期刊标题。我不能使用new_dftitle作为索引进行查找，因为它是重复的（例如，第0行和第1行）

经过大量的尝试和错误，以及我无法理解的join和merge方法的死胡同，我已经做到了：

for i in df.set_index('title').index:
    temp_sub = df.set_index('title').loc[i, 'subscribed']
    
    temp_filt = (new_df['title'] == i)
    new_df.loc[temp_filt, 'subscribed'] = temp_sub

使用原始df中的标题（每行都是唯一的）保存该标题的订阅状态，然后过滤该标题上的新的\u df，并设置订阅状态

问题:

我相信有更好的办法。不然我怎么会这样呢完成了吗
subscribed是我想带过来的七个左右的专栏之一根据期刊标题，从原始df。我能有效地做到这一点吗要执行七个单独的临时变量和赋值

编辑：添加所需的最终新参数

new_df = pd.DataFrame({
'title':['Veterinary pathology', 'Veterinary pathology', 'Clothing and textiles research journal'],
'era_subjects':["Agricultural and Veterinary Sciences", 'Fisheries Sciences', 'Built Environment and Design'],
'cpu_rank': ['1', '1', '2'],
'subscribed': ['True', 'True', 'False'],
'downloads': ['800', '800', '550']})

Tags： and 标题 df new title temp 期刊 subjects

1条回答

网友

1楼 · 发布于 2024-05-15 13:06:37

我能够通过使用.explode来实现这一点

df['era_split']我的函数后面看起来像"Agricultural and Veterinary Sciences', 'Fisheries Sciences"

df['era_split_by_quotecommaquote'] = df['era_split'].str.split('\', \'')
df2 = df.explode('era_split_by_quotecommaquote')    #new df defined with more rows than the original
df2.reset_index(drop=True, inplace=True)