Python如何使用for循环逐行解析?

0 投票
2 回答
4548 浏览
提问于 2025-04-17 13:29

写一个程序,让用户输入一个文件名,然后读取这个文件,查找那些格式是这样的行:

X-DSPAM-Confidence: 0.8475

当你遇到以“X-DSPAM-Confidence:”开头的行时,要把这一行拆开,提取出其中的浮点数(小数)。然后,统计这些行的数量,并计算这些行中所有的垃圾邮件信心值的总和。当你读到文件的末尾时,打印出垃圾邮件信心的平均值。

输入文件名:mbox.txt
平均垃圾邮件信心:0.894128046745

输入文件名:mbox-short.txt
平均垃圾邮件信心:0.750718518519

可以用mbox.txt和mbox-short.txt这两个文件来测试你的程序。

到目前为止,我写的代码是:

 fname = raw_input("Enter file name: ")
 fh = open(fname)
 for line in fh:
     pos  = fh.find(':0.750718518519')
     x = float(fh[pos:])
     print x

这段代码有什么问题吗?

2 个回答

-1

line.find #..... 这个是用来在一行文本中查找内容的...

print pos #这个是用来打印出位置,帮助你调试程序;)

float(fh[pos+1:]) #你得到的索引其实是那个冒号,所以你需要再往后移动一个位置

4

听起来他们是让你计算所有'X-DSPAM-Confidence'数字的平均值,而不是找出0.750718518519这个具体的数字。

我个人的做法是,先找到你需要的单词,提取出数字,然后把这些数字放到一个列表里,最后再计算平均值。

大概是这样的 -

# Get the filename from the user
filename = raw_input("Enter file name: ")

# An empty list to contain all our floats
spamflts = []

# Open the file to read ('r'), and loop through each line
for line in open(filename, 'r'):

    # If the line starts with the text we want (with all whitespace stripped)
    if line.strip().startswith('X-DSPAM-Confidence'):

        # Then extract the number from the second half of the line
        # "text:number".split(':') will give you ['text', 'number']
        # So you use [1] to get the second half
        # Then we use .strip() to remove whitespace, and convert to a float
        flt = float(line.split(':')[1].strip())

        print flt

        # We then add the number to our list
        spamflts.append(flt)

print spamflts
# At the end of the loop, we work out the average - the sum divided by the length
average = sum(spamflts)/len(spamflts)

print average

>>> lines = """X-DSPAM-Confidence: 1
X-DSPAM-Confidence: 5
Nothing on this line
X-DSPAM-Confidence: 4"""

>>> for line in lines.splitlines():
    print line


X-DSPAM-Confidence: 1
X-DSPAM-Confidence: 5
Nothing on this line
X-DSPAM-Confidence: 4

使用find:

>>> for line in lines.splitlines():
    pos = line.find('X-DSPAM-Confidence:')
    print pos

0
0
-1
0

我们可以看到find()只是给我们每一行中'X-DSPAM-Confidence:'的位置,而不是它后面数字的位置。

如果一行以'X-DSPAM-Confidence:'开头,那就更容易找到,然后像这样提取出数字:

>>> for line in lines.splitlines():
    print line.startswith('X-DSPAM-Confidence')


True
True
False
True

>>> for line in lines.splitlines():
    if line.startswith('X-DSPAM-Confidence'):
        print line.split(':')


['X-DSPAM-Confidence', ' 1']
['X-DSPAM-Confidence', ' 5']
['X-DSPAM-Confidence', ' 4']

>>> for line in lines.splitlines():
    if line.startswith('X-DSPAM-Confidence'):
        print float(line.split(':')[1])


1.0
5.0
4.0

撰写回答