基于tex的数据帧文本过滤

words (cell, CDKs, lung, mutations monomeric, Casitas, Background, acquired, evidence, kinases, small, evidence, Oncogenic ) data ID Text 0 Cyclin-dependent kinases CDKs regulate a 1 Abstract Background Non-small cell lung 2 Abstract Background Non-small cell lung 3 Recent evidence has demonstrated that acquired 4 Oncogenic mutations in the monomeric Casitas

2条回答

网友

1楼 · 编辑于 2024-06-07 14:01:32

我不确定这是最优雅的解决方案，但你可以：

to_remove = ['foo', 'bar']
df = pd.DataFrame({'Text': [
    'spam foo& eggs', 
    'foo bar eggs bacon and lettuce', 
    'spam and foo eggs'
]})

df['Text'].str.replace('|'.join(to_remove), '')

网友

2楼 · 编辑于 2024-06-07 14:01:32

您可以简单地使用^{}和一个简单的列表：

>>> df['Text'].apply(lambda x: ' '.join([i for i in x.split() if i in words]))
0                             kinases CDKs
1                     Background cell lung
2                     Background cell lung
3                        evidence acquired
4    Oncogenic mutations monomeric Casitas

另外，为了提高性能（O(1)平均查找时间），我建议您也这样做

相关问题更多 >

编程相关推荐

热门问题

热门文章

基于tex的数据帧文本过滤

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >