如何在Python中计算段落中的句子数量?
这是我目前的进展,但我的段落只有5个句号,所以只有5句话。可是它却一直返回14作为答案。有人能帮忙吗??
file = open ('words.txt', 'r')
lines= list (file)
file_contents = file.read()
print(lines)
file.close()
words_all = 0
for line in lines:
words_all = words_all + len(line.split())
print ('Total words: ', words_all)
full_stops = 0
for stop in lines:
full_stops = full_stops + len(stop.split('.'))
print ('total stops: ', full_stops)
这里是文本文件
图灵机是一种设备,它根据一张规则表在一条带子上操作符号。尽管它的设计很简单,图灵机可以被调整来模拟任何计算机算法的逻辑,特别是在解释计算机内部CPU的功能时非常有用。“图灵”机是由艾伦·图灵在1936年描述的,他称之为“自动机”。图灵机并不是为了作为一种实用的计算技术,而是作为一个假设的设备,代表计算机的工作原理。图灵机帮助计算机科学家理解机械计算的局限性。
4 个回答
6
使用正则表达式。
In [13]: import re
In [14]: par = "This is a paragraph? So it is! Ok, there are 3 sentences."
In [15]: re.split(r'[.!?]+', par)
Out[15]: ['This is a paragraph', ' So it is', ' Ok, there are 3 sentences', '']
9
最简单的方法就是这样做:
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize
sentences = 'A Turning machine is a device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a computer. The "Turing" machine was described by Alan Turing in 1936, who called it an "a(utomatic)-machine". The Turing machine is not intended as a practical computing technology, but rather as a hypothetical device representing a computing machine. Turing machines help computer scientists understand the limits of mechaniacl computation.'
number_of_sentences = sent_tokenize(sentences)
print(len(number_of_sentences))
输出结果:
5
5
如果一行里没有句号,split
函数会返回一个元素,就是这一整行的内容:
>>> "asdasd".split('.')
['asdasd']
所以你是在统计行数加上句号的数量。那你为什么要把文件分成一行一行的呢?
with open('words.txt', 'r') as file:
file_contents = file.read()
print('Total words: ', len(file_contents.split()))
print('total stops: ', file_contents.count('.'))