用python登录网页进行抓取

<div id="stubPage"> <div class="container"> <h1 id="stubPageTitle">LOGIN</h1> <div id="loginForm"> <form action="/do/login" method="post"> <legend>MechWarrior Online <a href="/signup" class="btn btn-warning pull-right">REGISTER</a></legend> <label>Email Address:</label> <div class="input-prepend"><span class="add-on textColorBlack textPlain">@</span><input id="email" name="email" class="span4" size="16" type="text" placeholder="user@example.org"></div> <label>Password:</label> <div class="input-prepend"><span class="add-on"><span class="icon-lock"></span></span><input id="password" name="password" class="span4" size="16" type="password"></div> <br> <button type="submit" class="btn btn-large btn-block btn-primary">LOGIN</button> <br> <span class="pull-right">[ <a href="#" id="forgotLink">Forgot Your Password?</a> ]</span> <br> <input type="hidden" name="return" value="/profile/stats?type=mech"> </form> </div> </div> </div>

1条回答

网友

1楼 · 发布于 2024-04-20 13:54:57

在提交表单数据时，请求文档非常简单且易于理解。请通读：More Complicated POST requests

登录通常归结为保存cookie并与将来的请求一起发送。在

使用requests.post()发布到登录页面后，使用request对象重新发布cookies。这是一种方法：

post_headers = {'content-type': 'application/x-www-form-urlencoded'}
payload = {'username':username, 'password':password}
login_request = requests.post(login_url, data=payload, headers=post_headers)
cookie_dict = login_request.cookies.get_dict()
stats_reqest = requests.get(stats_url, cookies=cookie_dict)

如果仍有问题，请使用login_request.status_code检查请求的返回代码，或者用login_request.text检查页面内容中的错误

编辑：

有些网站会在您提出请求时多次重定向您。一定要检查request.history对象，看看发生了什么，以及为什么被弹出。例如，我经常收到这样的重定向：

^{pr2}$

历史元组中的每个项都是另一个请求。您可以像普通请求对象一样检查它们，例如request.history[0].url，并且可以通过在请求参数中添加allow_redirects=False来禁用重定向：

login_request = requests.post(login_url, data=payload, headers=post_headers, allow_redirects=False)

在某些情况下，在进入正确的页面之前，我不得不禁止重定向和添加新的cookies。尝试使用类似这样的方法保留现有Cookie并向其中添加新Cookie：

cookie_dict = dict(cookie_dict.items() + new_request.cookies.get_dict().items())

在每个请求之后执行此操作将使您的cookies为您的下一个请求保持最新，类似于您的浏览器。在

相关问题更多 >

编程相关推荐

热门问题

热门文章