我将推文中的词限制为内容词，现在我想将这些词转换为小写，并添加得分较低的词组

2024-04-26 04:50:01 发布

您现在位置：Python中文网/ 问答频道 /正文

8162

网友

男 | 程序猿一只，喜欢编程写python代码。

我写了下面的代码，并将tweet中的单词限制为内容词，即名词、动词和形容词，现在我想将单词转换为小写，并添加分数较低的词性。例如：

爱情动词旧式名词但是我不知道怎么做，有人能帮我吗


! pip install wget
import wget
url = 'https://raw.githubusercontent.com/dirkhovy/NLPclass/master/data/reviews.full.tsv.zip'
wget.download(url, 'reviews.full.tsv.zip')


from zipfile import ZipFile
with ZipFile('reviews.full.tsv.zip', 'r') as zf:
    zf.extractall()


import pandas as pd
df = pd.read_csv('reviews.full.tsv', sep='\t', nrows=100000) # nrows , max amount of rows 
documents = df.text.values.tolist()
print(documents[:4])


import spacy

nlp = spacy.load('en_core_web_sm') #you can use other methods
# excluded tags
included_tags = {"NOUN", "VERB", "ADJ"}
#document = [line.strip() for line in open('moby_dick.txt', encoding='utf8').readlines()]

sentences = documents[:103] #first 10 sentences
new_sentences = []
for sentence in sentences:
    new_sentence = []
    for token in nlp(sentence):
        if token.pos_  in included_tags:
            new_sentence.append(token.text)
    new_sentences.append(" ".join(new_sentence))

#Creates a list of lists of tokens
tokens = [[token.text for token in nlp(new_sentence)] for new_sentence in documents[:200]]
tokens

# import itertools
# tok = itertools.chain.from_iterable(
#    [[token.text for token in nlp(new_sentence)] for new_sentence in documents[:200]])

# tok

Tags： text in import token new for nlp tsv

1条回答

网友

1楼 · 发布于 2024-04-26 04:50:01

我相信如果你改变

        new_sentence.append(token.text)

到

        new_sentence.append(token.text.lower()+'_'+token.POS)

你会得到你想要的

我将推文中的词限制为内容词，现在我想将这些词转换为小写，并添加得分较低的词组

相关问题更多 >

编程相关推荐

热门问题

热门文章

我将推文中的词限制为内容词，现在我想将这些词转换为小写，并添加得分较低的词组

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >