Python如何使用for循环逐行解析?

0 投票

2 回答

4548 浏览

数据工程师

提问于 2025-04-17 13:29

写一个程序，让用户输入一个文件名，然后读取这个文件，查找那些格式是这样的行：

X-DSPAM-Confidence: 0.8475

当你遇到以“X-DSPAM-Confidence:”开头的行时，要把这一行拆开，提取出其中的浮点数（小数）。然后，统计这些行的数量，并计算这些行中所有的垃圾邮件信心值的总和。当你读到文件的末尾时，打印出垃圾邮件信心的平均值。

输入文件名：mbox.txt
平均垃圾邮件信心：0.894128046745

输入文件名：mbox-short.txt
平均垃圾邮件信心：0.750718518519

可以用mbox.txt和mbox-short.txt这两个文件来测试你的程序。

到目前为止，我写的代码是：

 fname = raw_input("Enter file name: ")
 fh = open(fname)
 for line in fh:
     pos  = fh.find(':0.750718518519')
     x = float(fh[pos:])
     print x

这段代码有什么问题吗？

用户输入文本处理数据统计文件解析循环结构平均值计算浮点数提取垃圾邮件检测

2 个回答

-1

line.find #..... 这个是用来在一行文本中查找内容的...

print pos #这个是用来打印出位置，帮助你调试程序；)

float(fh[pos+1:]) #你得到的索引其实是那个冒号，所以你需要再往后移动一个位置

回答于 2025-04-17 由 Python大师

分享举报

听起来他们是让你计算所有'X-DSPAM-Confidence'数字的平均值，而不是找出0.750718518519这个具体的数字。

我个人的做法是，先找到你需要的单词，提取出数字，然后把这些数字放到一个列表里，最后再计算平均值。

大概是这样的 -

# Get the filename from the user
filename = raw_input("Enter file name: ")

# An empty list to contain all our floats
spamflts = []

# Open the file to read ('r'), and loop through each line
for line in open(filename, 'r'):

    # If the line starts with the text we want (with all whitespace stripped)
    if line.strip().startswith('X-DSPAM-Confidence'):

        # Then extract the number from the second half of the line
        # "text:number".split(':') will give you ['text', 'number']
        # So you use [1] to get the second half
        # Then we use .strip() to remove whitespace, and convert to a float
        flt = float(line.split(':')[1].strip())

        print flt

        # We then add the number to our list
        spamflts.append(flt)

print spamflts
# At the end of the loop, we work out the average - the sum divided by the length
average = sum(spamflts)/len(spamflts)

print average

>>> lines = """X-DSPAM-Confidence: 1
X-DSPAM-Confidence: 5
Nothing on this line
X-DSPAM-Confidence: 4"""

>>> for line in lines.splitlines():
    print line


X-DSPAM-Confidence: 1
X-DSPAM-Confidence: 5
Nothing on this line
X-DSPAM-Confidence: 4

使用find：

>>> for line in lines.splitlines():
    pos = line.find('X-DSPAM-Confidence:')
    print pos

0
0
-1
0

我们可以看到find()只是给我们每一行中'X-DSPAM-Confidence:'的位置，而不是它后面数字的位置。

如果一行以'X-DSPAM-Confidence:'开头，那就更容易找到，然后像这样提取出数字：

>>> for line in lines.splitlines():
    print line.startswith('X-DSPAM-Confidence')


True
True
False
True

>>> for line in lines.splitlines():
    if line.startswith('X-DSPAM-Confidence'):
        print line.split(':')


['X-DSPAM-Confidence', ' 1']
['X-DSPAM-Confidence', ' 5']
['X-DSPAM-Confidence', ' 4']

>>> for line in lines.splitlines():
    if line.startswith('X-DSPAM-Confidence'):
        print float(line.split(':')[1])


1.0
5.0
4.0

回答于 2025-04-17 由 Python大师

分享举报

Python如何使用for循环逐行解析?

2 个回答

撰写回答