如何在urlopen响应中搜索特定字符串?

2024-06-08 20:10:59 发布

您现在位置:Python中文网/ 问答频道 /正文

如果Troy Hunt发布了新的密码文件,我想查看Troy Hunt的网站“https://haveibeenpwned.com/Passwords”。为此,我阅读了网站,并希望搜索一个字符串,以获得文件的当前版本。它们总是以模式…v5.7z命名。v代表这个版本

# -*- coding: utf-8 -*-

import os
import urllib2
#from urllib2 import Request

from urllib2 import Request, urlopen, URLError, HTTPError
someurl='https://haveibeenpwned.com/Passwords'
req = Request(someurl, headers={'User-Agent': 'Mozilla/5.0'})
try:
    response = urlopen(req)
except HTTPError as e:
    print 'The server couldn\'t fulfill the request.'
    print 'Error code: ', e.code
except URLError as e:
    print 'We failed to reach a server.'
    print 'Reason: ', e.reason
else:
    print  "everything is fine"
    response = urllib2.urlopen(req)
    the_page = response.read()
    print(the_page)


在“页面”中,是页面的整个HTML代码。我如何搜索它

我不允许使用beautifulsoap或解析器


Tags: thehttpsimportcom网站responserequesturllib2