如何用Pandas切分句子的左右部分

2024-04-25 02:01:11 发布

您现在位置:Python中文网/ 问答频道 /正文

将句子转换成单词列表,然后找到根字符串的索引应该可以完成以下工作:

sentence = "lack of association between the promoter polymorphism of the mtnr1a gene and adolescent idiopathic scoliosis"
root = "mtnr1a"

try:
    words = sentence.split()
    n = words.index(root)
    cutoff = ' '.join(words[n-4:n+5])
except ValueError:
    cutoff = None

print(cutoff)

结果:

promoter polymorphism of the mtnr1a gene and adolescent idiopathic

如何在熊猫数据帧中实现它?你知道吗

我试着:

sentence = data['sentence'] 
root = data['rootword'] 
def cutOff(sentence,root): 
   try: 
      words = sentence.str.split() 
      n = words.index(root) 
      cutoff = ' '.join(words[n-4:n+5]) 
except ValueError: 
      cutoff = None 
      return cutoff 
data.apply(cutOff(sentence,root),axis=1)

但它不起作用。。。你知道吗

编辑:

当词根在句首时,词根在句尾时,词根在句尾时,如何在词根后的4个字符串后切分句子? 例如:

sentence = "mtnr1a lack of association between the promoter polymorphism of the gene and adolescent idiopathic scoliosis"
out if root in first position:
"mtnr1a lack of association between"
out if root in last position:
"lack of association between the promoter polymorphism of the gene and adolescent idiopathic scoliosis"
"adolescent idiopathic scoliosis mtnr1a"

Tags: oftherootbetweensentencecutoffwordsgene
1条回答
网友
1楼 · 发布于 2024-04-25 02:01:11

代码中的两个小调整应该可以解决您的问题:

首先,对数据帧调用^{}将函数应用于调用它的数据帧的每一行中的值。你知道吗

您不必将列作为输入传递给函数,调用sentence.str.split()也没有意义。在cutOff()函数中sentence只是一个常规字符串(不是列)。你知道吗

将函数更改为:

def cutOff(sentence,root): 
    try: 
        words = sentence.split()  # this is the line that was changed
        n = words.index(root) 
        cutoff = ' '.join(words[n-4:n+5]) 
    except ValueError: 
        cutoff = None 
    return cutoff

接下来您只需指定将作为函数输入的列—您可以使用lambda

df.apply(lambda x: cutOff(x["sentence"], x["rootword"]), axis=1)
#0    promoter polymorphism of the mtnr1a gene and a...
#dtype: object

相关问题 更多 >