从tweets中删除正确的英语单词

2024-05-16 12:32:58 发布

您现在位置：Python中文网/ 问答频道 /正文

5977

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在使用R处理twitter数据，并试图从tweet中删除所有正确的英语单词。这个想法是看一个特定的人口使用的口语缩写，打字错误和俚语，我记录了他们的推特。你知道吗

示例：

    tweet <- c("Trying to find the solution frustrated af")

做完上述手术后，我只想做“af”

我想用字典（我会下载）来清洗tweets，但肯定有一个更简单的选择。 Python中的任何解决方案也会有所帮助。你知道吗

Tags： the to 数据示例错误记录 twitter find

1条回答

网友

1楼 · 发布于 2024-05-16 12:32:58

另一个基于拼写的解决方案使用了一个相当新的有趣的package：

# install.packages("hunspell") # uncomment & run if needed
library(hunspell)
tweet <- c("Trying to find the solution frustrated af")
( tokens <- strsplit(tweet, " ")[[1]] )
# [1] "Trying"     "to"         "find"       "the"        "solution"   "frustrated" "af"        
tokens[!hunspell_check(tokens), dict = "en_US"]
# [1] "af"

从tweets中删除正确的英语单词

相关问题更多 >

编程相关推荐

热门问题

热门文章

从tweets中删除正确的英语单词

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >