删除停止字（NLTK）时保持格式（换行符）

2024-04-25 12:49:40 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在使用NLTK从文件中删除stopwords。该文件是由换行符分隔的一系列tweet。我已经设置了删除停止词，但它也剥离了新行字符，所以它不再是一个推文每行。这是我的密码：

stuff = codecs.open("/Users/user/Desktop/ngrms/Nonsrcstic.txt", "r", encoding="utf-8")
word_list = stuff.readlines()
[x.encode('utf-8') for x in word_list]

f = open('english')
stops = f.read()

for line in word_list:
    for w in line.split('\n'):
        if w.lower() not in stops:
            with open("nostops_Nonsrcstic.txt", "a") as tweetsNoStops:
                tweetsNoStops.write(w.encode('utf-8') + " ")

输入文件如下所示：

 Baby boomers are now at the age where "work or retire" is frequently considered choice. 
 There's a few people I miss but the truth of the matter is, my name probably hasn't crossed their minds or they don't give a shit about me 
 What you must remember is, I do yarn shows with the help of a Fiat Panda and Tatiana, the trailer, which is small #itfitsbehindaPanda  
 @BetBright The AP boost won't work lads says try again later is there a problem with the site?

输出如下所示：

Baby boomers age "work retire" frequently considered choice.  There's people miss truth matter is, name probably hasn't crossed minds don't give shit must remember is, yarn shows help Fiat Panda Tatiana, trailer, small #itfitsbehindaPanda @BetBright AP boost won't work lads says try later problem site?

Tags：文件 the in txt for is with open

0条回答

目前没有回答

删除停止字（NLTK）时保持格式（换行符）

相关问题更多 >

编程相关推荐

热门问题

热门文章

删除停止字（NLTK）时保持格式（换行符）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >