使用列的POS(Pandas)

2024-06-11 04:24:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我只想从此数据集中提取名词:

        Text1                                      Text2
        
see if your area is affected afte...     public health england have confir...
'i had my throat scraped'.               i have been producing some of our...
drive-thru testing introduced at w...   “a painless throat swab will be t...
live updates as first case confirm...    the first case  in ...

hampton hill medical centre              love is actually just ...
berkshire: public health england a...    an official public health england...

我需要在Text2中应用POS,以便只提取ADV

ans=[]
for x in 
    tagger = treetaggerwrapper.TreeTagger(TAGLANG="en", TAGDIR='path')
    tags = tagger.tag_text(x)
    ans.append(tags)
    pprint(treetaggerwrapper.make_tags(tags))

但是我没有包括这个专栏,因为我不知道我应该放什么(e.g. df['Text 2'].tolist()

我需要的是从文本中提取副词,并将它们添加到新的数组/空列表中。 我希望你能帮助我


Tags: 数据inishavetagspublictaggerfirst
1条回答
网友
1楼 · 发布于 2024-06-11 04:24:47

对于这样的工作,我更喜欢通过谷歌Colab搜索spAcy。对于这类任务,我一般更喜欢spAcy

如果你想在看到我的答案之前试一下,看看这里。 https://spacy.io/usage/linguistic-features

如果可以,您可以pip安装

        # Please open this notebook in playground mode (File -> Open in playground mode) and then run this block first to download the spaCy model you will be using
    !pip install spacy
    !python -m spacy download en_core_web_sm

我们这里只使用熊猫和spAcy,不需要其他软件包

import pandas as pd
import spacy

重新创建DF

list1 = '''see if your area is affected afte... 
'i had my throat scraped'. drive-thru testing introduced at w... 
live updates as first case confirm...'''


list2 = '''hampton hill medical centre             
berkshire: public health england a...   

public health england have confir...
i have been producing some of our...
a painless throat swab will be t...
the first case  in ...
love is actually just ...
an official public health england...'''

df = pd.DataFrame([[list1, list2]], columns = ['Text1', 'Text2'])

抓取字符串并初始化空间

string = df.iloc[0,1]
nlp = spacy.load("en_core_web_sm")

接下来,我将所有内容都写入函数中

def list_adv(string):
    '''
    input: list_adv will perform named entity recongition on the input
    return: adv will be a list of all adverbs from the input
    '''
    # have to tell spacy we are doing NLP on the input data
    doc = nlp(string)

    # Blank list to append adverbs to as we search
    adv = []

    # For all named entites in the document
    for token in doc:

      # if the named entity is a adverb, append it
      if token.pos_ == 'ADV':
        adv.append(token.text)

      # if not, skip it
      else:
        continue
      
    # Return the final product
    return adv

adv_list = list_adv(string)

最终产品将根据您的问题要求提供副词列表

相关问题 更多 >