停止字删除代码不工作,返回相同的字符串

2024-05-13 10:39:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从一个句子字符串中删除停止词,但我的print函数返回包含所有停止词的确切字符串。这里是我正在使用的代码,其中chat_map['Phillips Allen']是我从群组聊天中解析的句子字符串

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
stop_words = set(stopwords.words("english"))

filtered_sentences_phillip = []
for w in chat_map['Phillip Allen']:
  if w not in stop_words:
    filtered_sentences_phillip.append(w)
print(filtered_sentences_phillip)

这个代码返回这个

['Hello?', 'Yeah, how are you?', "Oh, sorry about that. I didn't know.", '(laughter) Oh because of the, the BBA thing?', "you're not going to get paid any bro for this and we'll send you lunch around for the whole desk.", '100 yards...', "hi guys i hope everybody's enjoying there trade this week", 'things seems to be going on well', 'later guys', "don't touch it yet john it's still riding", 'but get ready any moment from now will be closing time', "right now i'm having 64 pips", 'hopping to close higher', 'see you later', 'hi john i hope you closed your gbp/usd long with good pips all green like i did', 'i closed with 76pips', 'whats your position now', "i've taking it short 1.6853", 'just follow and see how it will work out', "so far i'm painting green", 'hi showtime 183', 'join me on skype', 'gbp/usd so far so good john', 'green 45 pips', "i'm still holding john", "but as you said may be it's getting near closing time", 'lets keep an eye out together', 'hi john', "and how's your trading going on", "hi mike and how's trading going on", 'hi steve', 'hi john', "yes i'm trading today john", 'and very busy keeping an eye on it too', "steve how's trading going on", "hard work that's all it takes", "i'd love it if you guys will be my friends on skype", 'i like having fellow traders as friends on skype', 'usd/cad positioned at 1.0939 short', 'eur/chf positioned at 1.2202 long', 'IMO', 'later guys', 'pip watching time', 'hi john', 'been a long time', 'eur/gbp was long', 'but now about to go long any time from now', 'long can still hold on for a while and lets see what the next candle will say at 4h time frame', 'sorry was short and about to go long', 'short can still hold on till the next candle at 4h time frame', 'long position expected', 'eur/gbp going long already', 'how do you see john', 'learn to control your emotions steve this is very important', 'control of emotions is part of success and failure', 'because trading to emotions can lead to and often does lead to wrong decisions', 'making entry and taking exit at the wrong time', 'Hi, John', 'Yes, I remember.', "I'd prefer to keep the actual data", 'Hello mate? You all set?', "Right listen we've had a couple of words with them, you want them lower right?", "Alright okay, alright listen, we've had a couple words with them. You want them lower, right?", 'Glad to hear that you liked it', 'Did you hear last news?', 'Agree. Very promising', 'Happy birthday!']

你知道怎么回事吗


Tags: andthetoyoutimeonithi
1条回答
网友
1楼 · 发布于 2024-05-13 10:39:36

问题的根源在于,在将每个单词与停止词列表进行比较之前,您没有将句子拆分成单词。 使用str.split()函数

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
stop_words = set(stopwords.words("english"))

filtered_sentences_phillip = []
for w in chat_map['Phillip Allen'].split():
  if w not in stop_words:
    filtered_sentences_phillip.append(w)
print(filtered_sentences_phillip)

最小可复制示例

import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
stop_words = stopwords.words("english")

sentence = 'Yeah, how are you?'
filtered_sentences = []
for word in sentence.split():
  if word not in stop_words:
    filtered_sentences.append(word)
print(filtered_sentences)

使用python列表理解的格式更好的可复制示例

import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
stop_words = stopwords.words("english")

sentence = 'Yeah, how are you?'
filtered_sentences = [word for word in sentence.split() if word not in stop_words]
print(filtered_sentences)

输出

['Yeah,', 'you?']

请注意,输出是已处理句子的单词列表。如果你想把句子作为一个字符串使用

" ".join(filtered_sentences)

PS:在删除停止词之前,最好进行一些柠檬化或词干分析

相关问题 更多 >