基于'\n'拆分字符串中的单词 - 问答 - Python中文网

基于'\n'拆分字符串中的单词

2024-04-20 09:41:19 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

伙计们，我有一个字符串，我正试图做一个ngram，但我有一个问题，当我做ngram = ngrams(raw_text.split(" "), n=1 输出为

[('come',), ('here,',), ('girl\noh,',), ('you',)....]

问题是，在我的字符串中，单词的排列方式如下：

come here, girl\noh, you want...

这意味着我的ngram比它需要的要大得多那么我该怎么做才能得到这样一根弦呢

come here , girl \n oh , you ...

所以我的ngram要小一点谢谢你们希望你们今天过得愉快

编辑我现在意识到我正在使用一个分隔符，并已更改。。。所以\n问题消失了，但是我可以将单词拆分为一个包含标点符号的字符串吗？你知道吗

Tags：字符串 text you raw here 方式单词 split

1条回答

网友

1楼 · 发布于 2024-04-20 09:41:19

Can I split the words within a string that have punctuation in them?

你的最终结果仍然不清楚：你想包括标点符号还是完全放弃它？假设您不需要标点符号，那么使用re.split()就很简单了：

>>> import re
>>> re.split(r'\W+', "Hello, this'll split by\n \nwhitespace and also puncutation!")
['Hello', 'this', 'll', 'split', 'by', 'whitespace', 'and', 'also', 'puncutation', '']

如果你想以一种更聪明的方式分开，这会很快变得复杂。我建议使用nltk工具箱，它提供了其他选项nltk.word_tokenize：

>>> import nltk
>>> nltk.word_tokenize("Hello, this'll split by\n \nwhitespace and also puncutation!")
['Hello', ',', 'this', "'ll", 'split', 'by', 'whitespace', 'and', 'also', 'puncutation', '!']

相关问题更多 >

编程相关推荐

热门问题

热门文章