遍历嵌套字符串列表以获取第一项

df = pd.DataFrame({'id': [620, 843, 986], 'tit': ['AAA', 'BBB', 'CCC'], 'gen': [['Romance', 'Satire', 'Fiction'], ['Science Fiction', 'Novel'], ['Mystery', 'Novel']]}) genre_code = ['Science Fiction', 'Mystery', 'Non-fiction']

/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in sanitize_index(data, index) 746 if len(data) != len(index): 747 raise ValueError( --> 748 "Length of values " 749 f"({len(data)}) " 750 "does not match length of index " ValueError: Length of values (30004) does not match length of index (12841)

2条回答

网友

1楼 · 编辑于 2024-05-15 22:01:00

我会将列表转换为字符串，然后使用series.str.findall返回匹配的类型代码：

df['new_gen'] = df['gen'].astype(str).str.findall('|'.join(genre_code))

print(df)

    id  tit                         gen            new_gen
0  620  AAA  [Romance, Satire, Fiction]                 []
1  843  BBB    [Science Fiction, Novel]  [Science Fiction]
2  986  CCC            [Mystery, Novel]          [Mystery]

网友

2楼 · 编辑于 2024-05-15 22:01:00

如果要根据列表筛选gen列，可以执行以下操作：

df["gen"] = df["gen"].apply(lambda x: [g for g in x if g in genre_code])
print(df)

印刷品：

    id  tit                gen
0  620  AAA                 []
1  843  BBB  [Science Fiction]
2  986  CCC          [Mystery]

附言：为了加快这个过程，您可以在以下步骤之前将genre_code转换为set()：

genre_code = set(["Science Fiction", "Mystery", "Non-fiction"])
df["gen"] = df["gen"].apply(lambda x: [g for g in x if g in genre_code])

相关问题更多 >

编程相关推荐

热门问题

热门文章