通过Python脚本在网页中填充表单值(非测试)
我需要通过Python在一个目标页面上填写表单,然后点击一个按钮。我查过Selenium和Windmill,但这些都是测试框架——我并不是在做测试。我想要程序化地登录一个第三方网站,然后下载并解析一个文件,以便把它插入到我们的数据库里。问题是这些测试框架会启动浏览器实例;我只是想要一个可以每天定时运行的脚本,来获取我想要的页面。有没有办法做到这一点?
5 个回答
8
你可以使用标准的 urllib
库来做到这一点,方法如下:
import urllib
urllib.urlretrieve("http://www.google.com/", "somefile.html", lambda x,y,z:0, urllib.urlencode({"username": "xxx", "password": "pass"}))
20
看看这个使用Mechanize的例子:它会给你一个基本的概念:
#!/usr/bin/python
import re
from mechanize import Browser
br = Browser()
# Ignore robots.txt
br.set_handle_robots( False )
# Google demands a user-agent that isn't a robot
br.addheaders = [('User-agent', 'Firefox')]
# Retrieve the Google home page, saving the response
br.open( "http://google.com" )
# Select the search box and search for 'foo'
br.select_form( 'f' )
br.form[ 'q' ] = 'foo'
# Get the search results
br.submit()
# Find the link to foofighters.com; why did we run a search?
resp = None
for link in br.links():
siteMatch = re.compile( 'www.foofighters.com' ).search( link.url )
if siteMatch:
resp = br.follow_link( link )
break
# Print the site
content = resp.get_data()
print content
33
你在寻找 Mechanize
这是一个提交表单的示例:
import re
from mechanize import Browser
br = Browser()
br.open("http://www.example.com/")
br.select_form(name="order")
# Browser passes through unknown attributes (including methods)
# to the selected HTMLForm (from ClientForm).
br["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__)
response = br.submit() # submit current form