在python中对整个句子进行柠檬化是不可能的

2024-04-27 04:21:05 发布

您现在位置：Python中文网/ 问答频道 /正文

9403

网友

男 | 程序猿一只，喜欢编程写python代码。

我在python中使用NLTK包中的WordNetLemmatizer（）函数来对电影评论数据集的整个句子进行lemmatize。你知道吗

这是我的密码：

from nltk.stem import LancasterStemmer, WordNetLemmatizer
lemmer = WordNetLemmatizer()

def preprocess(x):

    #Lemmatization
    x = ' '.join([lemmer.lemmatize(w) for w in x.rstrip().split()])

    # Lower case
    x = x.lower()

    # Remove punctuation
    x = re.sub(r'[^\w\s]', '', x)

    # Remove stop words
    x = ' '.join([w for w in x.split() if w not in stop_words])    
    ## EDIT CODE HERE ## 

    return x

df['review_clean'] = df['review'].apply(preprocess)

df中的review是我想要处理的文本评论列

在df上使用preprocess函数后，新的列review\u clean包含清理过的文本数据，但仍然没有柠檬化的文本。我能看到很多词以-ed，-ing结尾。你知道吗

提前谢谢。你知道吗

Tags：数据函数 in 文本 df for 评论 review

1条回答

网友

1楼 · 发布于 2024-04-27 04:21:05

你必须传递'v'（动词）来柠檬化：

x = ' '.join([lemmer.lemmatize(w, 'w') for w in x.rstrip().split()])

示例：

In [11]: words = ["answered", "answering"]

In [12]: [lemmer.lemmatize(w) for w in words]
Out[12]: ['answered', 'answering']

In [13]: [lemmer.lemmatize(w, 'v') for w in words]
Out[13]: ['answer', 'answer']

在python中对整个句子进行柠檬化是不可能的

相关问题更多 >

编程相关推荐

热门问题

热门文章

在python中对整个句子进行柠檬化是不可能的

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >