Python如何使用for循环逐行解析?
写一个程序,让用户输入一个文件名,然后读取这个文件,查找那些格式是这样的行:
X-DSPAM-Confidence: 0.8475
当你遇到以“X-DSPAM-Confidence:”开头的行时,要把这一行拆开,提取出其中的浮点数(小数)。然后,统计这些行的数量,并计算这些行中所有的垃圾邮件信心值的总和。当你读到文件的末尾时,打印出垃圾邮件信心的平均值。
输入文件名:mbox.txt
平均垃圾邮件信心:0.894128046745
输入文件名:mbox-short.txt
平均垃圾邮件信心:0.750718518519
可以用mbox.txt和mbox-short.txt这两个文件来测试你的程序。
到目前为止,我写的代码是:
fname = raw_input("Enter file name: ")
fh = open(fname)
for line in fh:
pos = fh.find(':0.750718518519')
x = float(fh[pos:])
print x
这段代码有什么问题吗?
2 个回答
-1
line.find
#..... 这个是用来在一行文本中查找内容的...
print pos
#这个是用来打印出位置,帮助你调试程序;)
float(fh[pos+1:])
#你得到的索引其实是那个冒号,所以你需要再往后移动一个位置
4
听起来他们是让你计算所有'X-DSPAM-Confidence'数字的平均值,而不是找出0.750718518519
这个具体的数字。
我个人的做法是,先找到你需要的单词,提取出数字,然后把这些数字放到一个列表里,最后再计算平均值。
大概是这样的 -
# Get the filename from the user
filename = raw_input("Enter file name: ")
# An empty list to contain all our floats
spamflts = []
# Open the file to read ('r'), and loop through each line
for line in open(filename, 'r'):
# If the line starts with the text we want (with all whitespace stripped)
if line.strip().startswith('X-DSPAM-Confidence'):
# Then extract the number from the second half of the line
# "text:number".split(':') will give you ['text', 'number']
# So you use [1] to get the second half
# Then we use .strip() to remove whitespace, and convert to a float
flt = float(line.split(':')[1].strip())
print flt
# We then add the number to our list
spamflts.append(flt)
print spamflts
# At the end of the loop, we work out the average - the sum divided by the length
average = sum(spamflts)/len(spamflts)
print average
>>> lines = """X-DSPAM-Confidence: 1
X-DSPAM-Confidence: 5
Nothing on this line
X-DSPAM-Confidence: 4"""
>>> for line in lines.splitlines():
print line
X-DSPAM-Confidence: 1
X-DSPAM-Confidence: 5
Nothing on this line
X-DSPAM-Confidence: 4
使用find:
>>> for line in lines.splitlines():
pos = line.find('X-DSPAM-Confidence:')
print pos
0
0
-1
0
我们可以看到find()
只是给我们每一行中'X-DSPAM-Confidence:'
的位置,而不是它后面数字的位置。
如果一行以'X-DSPAM-Confidence:'
开头,那就更容易找到,然后像这样提取出数字:
>>> for line in lines.splitlines():
print line.startswith('X-DSPAM-Confidence')
True
True
False
True
>>> for line in lines.splitlines():
if line.startswith('X-DSPAM-Confidence'):
print line.split(':')
['X-DSPAM-Confidence', ' 1']
['X-DSPAM-Confidence', ' 5']
['X-DSPAM-Confidence', ' 4']
>>> for line in lines.splitlines():
if line.startswith('X-DSPAM-Confidence'):
print float(line.split(':')[1])
1.0
5.0
4.0