如何登录网站并单击按钮，然后使用python获取源代码

2条回答

网友

1楼 · 编辑于 2024-04-20 08:57:46

首先，创建一个soup变量

soup = BeautifulSoup(page ‘html.parser’)

然后创建另一个var，用soup.find查找该值

val = soup.find('input', {'id': 'myInput2'}).get('value')
print(val)

网友

2楼 · 编辑于 2024-04-20 08:57:46

登录页面的响应应该包含一些值（可能是一个cookie，但也可能是其他值），用于标识登录，并且需要将其与请求一起传递到引用页面

因此，您的代码需要处理cookies

看看这个要点，我尝试从我的朋友档案中获取联系信息（代码很旧，可能在当前的facebook上不再起作用-我已经很久没有尝试过了，原始来源是here）：

https://gist.github.com/kutschkem/7690411#file-infb-py-L83

CHandler = urllib2.HTTPCookieProcessor(cookielib.CookieJar())
browser = urllib2.build_opener(CHandler)
urllib2.install_opener(browser)

#Retrieve login form data and initialize the cookies
res = browser.open('https://www.facebook.com/login.php')

#Determine string encoding
content_type = res.info()['Content-Type'].split('; ')
encoding = 'utf-8'
if len(content_type) > 1 and content_type[1].startswith('charset'):
    encoding = content_type[1].split('=')[1]
html = bytes.decode( res.read(), encoding=encoding )
res.close()

#scrape form for hidden inputs, add email and password to values
form_scraper = FormScraper()
form_scraper.feed(html)
form_data = form_scraper.values
form_data.extend( [('email', user), ('pass', passw)] )
#HACK: urlencode doesn't like strings that aren't encoded with the 'encode' function.
#Using html.encode(encoding) doesn't help either. why ??
form_data = [ ( x.encode(encoding), y.encode(encoding) ) for x,y in form_data ]
data = urllib.parse.urlencode(form_data)

# Login
print ('Logging in to account ' + user)
res = browser.open('https://login.facebook.com/login.php?login_attempt=1', str.encode(data))
rcode = res.code
print (rcode)
print (res.url)
if re.search('/login.php?login_attempt=1', res.url):
    print ('Login Failed')
    exit(2)
res.close()

# Get Emails and Phone Numbers
print ("Getting Info..\n")
for friend in friends['data']:
    print(friend)
    prof = 'http://facebook.com/' + str(friend['id'])
    res = browser.open(prof)
    # do stuff with the responds

这里的要点是使用urllib中的一个对象来处理cookie。按照您现在的方式，登录尝试和阅读推荐页面之间没有任何联系，它们只是针对请求。从页面的角度来看，这是两个用户分别提出的请求。要连接这些点，您需要cookie处理

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何登录网站并单击按钮，然后使用python获取源代码

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >