擅长:python、mysql、java
<p>似乎你想做标记化。试试<code>nltk</code></p>
<p><a href="http://text-processing.com/demo/tokenize/" rel="nofollow noreferrer">http://text-processing.com/demo/tokenize/</a></p>
<pre><code>from nltk.tokenize import TreebankWordTokenizer
splits = TreebankWordTokenizer().tokenize(text)
</code></pre>