正则表达式没有返回匹配，但显然存在匹配

def scrubdividata(ticker): sleep(1.0) # Time in seconds. f = urllib2.urlopen('the url') lines = f.readlines() for i in range(0,len(lines)): line = lines[i] if "Annual Dividend:" in line: print 'for ticker %s, annual dividend is in line'%(ticker) s = str(lines[i+1]) print s start = '>$' end = '</td>' AnnualDiv = re.search('%s(.*)%s' % (start, end), s).group(1)

for ticker A, annual dividend is in line <td class="number">$0.48</td> Traceback (most recent call last): File "test.py", line 115, in <module> scrubdividata(ticker) File "test.py", line 34, in scrubdividata LastDiv = re.search('%s(.*)%s' % (start, end), s).group(1) AttributeError: 'NoneType' object has no attribute 'group'

2条回答

网友

1楼 · 编辑于 2024-06-16 09:50:08

你得避开美元符号。你知道吗

start = '>\$'
end = '</td>'
AnnualDiv = re.search('%s(.*)%s' % (start, end), s).group(1)

原因是$是regex中的一个特殊字符。（它匹配字符串的结尾或换行符之前。）

这将把AnnualDiv设置为字符串'0.48'。如果要添加$，可以使用以下方法：

AnnualDiv = "$%s" % re.search('%s(.*)%s' % (start, end), s).group(1)

网友

2楼 · 编辑于 2024-06-16 09:50:08

从文档中：

"$" Matches the end of the string or just before the newline at the end of the string.

所以你可能想避开这条线上的美元符号，就像这样：

start = '>\$'

如果您考虑将来通过HTML进行更多的搜索，我建议您看看Beautiful Soup模块。它比正则表达式更宽容一点。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章