无法将从HTML fi中提取的整数相加

2024-04-20 03:39:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我很难将链接的HTML文件中的数字相加(求和)。你知道吗

我当前收到此错误:

Line 26 b=sum(y)  typeError unsupported operand types for +: int and str

这是我的密码

import urllib
from BeautifulSoup import *
import re

counter = 0
added = 0


url = "http://python-data.dr-chuck.net/comments_42.html"
html = urllib.urlopen(url).read()

soup = BeautifulSoup(html)

# Retrieve all of the span tags
spans = soup('span')

for comments in spans:
    print comments
    counter +=1
    #y = re.findall('(\d+)', comments)  -- didnt work 
    #print y
    #added += y
y = re.findall('(\d+)', str(soup))
print y
b = sum(y)
print b

print "Count", counter
print "Sum", added

我想要的输出是这样的:

Count: 50
Sum: 2482

正如您所看到的,我在哪里注释了我的代码-我最初尝试这样添加它们。不知道为什么这样不行。你知道吗

#y = re.findall('(\d+)', comments)  -- didnt work 
    #print y
    #added += y

我也不知道为什么这会把找到的数字放在一个列表中

y = re.findall('(\d+)', str(soup))

Tags: importreaddedforhtmlcounter数字urllib
2条回答

你在尝试对字符串求和。 在求和之前将字符串转换为整数,如Pynchia所说,然后打印b as the Sum。你知道吗

...
b = sum(map(int, y))
...
print "Count", counter
print "Sum", b

如果要更正注释部分,请使用:

...
y = re.findall('(\d+)', str(comments))
print y
added = sum(map(int, y))

引用Python Documentation

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

此表达式:

y = re.findall('(\d+)', str(soup))将返回与模式(\d+)匹配的所有字符串的列表,即数字字符串。所以你有一个字符串列表。你知道吗

那么

b = sum(y),将尝试使用一些字符串而不是整数,这就是您收到错误消息的原因。你知道吗

请尝试:

b = sum(map(int, y)),这将把y中的每个字符串数字转换成整数,然后将它们相加。你知道吗

演示:

>>> s = 'Today is 31st, December, Temperature is 18 degC'
>>> y = re.findall('(\d+)', s)
['31', '18']
>>> b = sum(map(int, y))
>>> b
49

相关问题 更多 >