使用数据框列结构。包含并更新pandas数据帧列

网友

1楼 · 编辑于 2024-04-19 18:41:19

如果句子总是以this is开头，后跟fruit name，也就是说，如果第三个单词总是fruit name，那么您还可以使用apply和split()函数，这样对于每一行的数据帧string进行拆分，并取第三个结果来替换列D的值：

df['D'] = df['C'].apply(lambda val: val.split()[2])

或者如另一个答案中所说的split函数

df['D'] = df['C'].str.split().str[2]

输出：

C D 0 this is orange orange 1 this is apple apple 2 this is pear pear 3 this is plum plum 4 this is orange orange

网友

2楼 · 编辑于 2024-04-19 18:41:19

考虑这个数据帧

df= pd.DataFrame({"C": ['this is orange','this is apple which is red','this is pear','this is plum','this is orange'], "D": [0,0,0,0,0]})

    C                           D
0   this is orange              0
1   this is apple which is red  0
2   this is pear                0
3   this is plum                0
4   this is orange              0

假设水果的名称跟在“this is”后面，可以使用下面的代码提取水果名称

^{pr2}$

你得到了

    C                           D
0   this is orange              orange
1   this is apple which is red  apple
2   this is pear                pear
3   this is plum                plum
4   this is orange              orange

对于您发布的示例数据集，一个简单的空间分割和提取最后一个元素就可以了

df['D'] = df.C.str.split(' ').str[-1]

网友

3楼 · 编辑于 2024-04-19 18:41:19

由于您没有说明水果是如何提取的，所以我假设它前面总是有“this is”；因此以下内容应该很有帮助：

import pandas as pd

d = {'C': ['this is orange',
  'this is apple',
  'this is pear',
  'this is plum',
  'this is orange'],
 'D': [0, 0, 0, 0, 0]}

dff = pd.DataFrame(d)

dff['D'] = dff.C.str.replace(r'(this is) ([A-Za-z]+)','\\2')
# or just
dff.C.str.replace('this is ','')


#                 C       D
# 0  this is orange  orange
# 1   this is apple   apple
# 2    this is pear    pear
# 3    this is plum    plum
# 4  this is orange  orange

它使用.str.replace将“This is”替换为空字符串。在

我希望这有帮助。在

相关问题更多 >

编程相关推荐

热门问题

热门文章