我从nltk中获取了文本语料库,现在想对其进行处理,以确保文件中的每一行都以标点符号结束。你知道吗
Her mother
had died too long ago for her to
remember her caresses; and her place had been supplied
by an excellent woman as governess, who had fallen little short
of a mother in affection.
应该变成:
Her mother had died too long ago for her to remember her caresses;
and her place had been supplied by an excellent woman as governess, who had fallen little short of a mother in affection.
我试着匹配sed,如果在行尾没有标点符号,但不知道如何向上移动下一行。如果有任何帮助,我将不胜感激!你知道吗
如果你这样用
paste
和sed
会怎么样?你知道吗paste
打印同一行中的所有文本。你知道吗sed
在每个.
和;
之后添加新行。你知道吗在Python中:
在
with
块终止后,output
将成为预期的文件。如果需要保持这种状态,可以用output
覆盖该文件。你知道吗使用NLTK的
sent_tokenize()
:相关问题 更多 >
编程相关推荐