如何向 http://www.ratsit.se/BC/Search.aspx 提交查询?我写了个脚本,但“点击”按钮似乎有问题
import urllib2, cookielib
import ClientForm
from BeautifulSoup import BeautifulSoup
first_name = "Mona"
last_name = "Sahlin"
url = 'http://www.ratsit.se/BC/Search.aspx'
cookiejar = cookielib.LWPCookieJar()
cookiejar = urllib2.HTTPCookieProcessor(cookiejar)
opener = urllib2.build_opener(cookiejar)
urllib2.install_opener(opener)
response = urllib2.urlopen(url)
forms = ClientForm.ParseResponse(response, backwards_compat=False)
#Use to print out forms if website design changes
for x in forms:
print x
'''
forms print result:
<aspnetForm POST http://www.ratsit.se/BC/Search.aspx application/x-www-form-urlencoded <HiddenControl(__VIEWSTATE=/wEPDwULLTExMzU2NTM0MzcPZBYCZg9kFgICAxBkZBYGAgoPDxYCHghJbWFnZVVy....E1haW4kZ3J2U2VhcmNoUmVzdWx0D2dkBRdjdGwwMCRtdndVc2VyTG9naW5MZXZlbA8PZGZkle2yQ/dc9eIGMaQPJ/EEJs899xE=) (readonly)>
<TextControl(ctl00$cphMain$txtFirstName=)>
<TextControl(ctl00$cphMain$txtLastName=)>
<TextControl(ctl00$cphMain$txtBirthDate=)>
<TextControl(ctl00$cphMain$txtAddress=)>
<TextControl(ctl00$cphMain$txtZipCode=)>
<TextControl(ctl00$cphMain$txtCity=)>
<TextControl(ctl00$cphMain$txtKommun=)>
<CheckboxControl(ctl00$cphMain$chkExaktStavning=[on])> <ImageControl(ctl00$cphMain$cmdButton=)>
>
'''
#Confirm correct form
form = forms[0]
print form.__dict__
#print form.__dict__.get('controls')
controls = form.__dict__.get('controls')
print "------------------------------------------------------------"
try:
controls[1] = first_name
controls[2] = last_name
page = urllib2.urlopen(form.click('ctl00$cphMain$cmdButton')).read()
''' 在这里出现错误: 发生了以下错误: "'str'对象没有'name'这个属性'" '''
# print controls[9]
print '----------here-------'
soup = BeautifulSoup(''.join(page))
soup = soup.prettify()
1 个回答
1
这是一个可以正常运行的版本:
import urllib2, cookielib
import ClientForm
from BeautifulSoup import BeautifulSoup
first_name = "Mona"
last_name = "Sahlin"
url = 'http://www.ratsit.se/BC/Search.aspx'
cookiejar = cookielib.LWPCookieJar()
cookiejar = urllib2.HTTPCookieProcessor(cookiejar)
opener = urllib2.build_opener(cookiejar)
urllib2.install_opener(opener)
response = urllib2.urlopen(url)
forms = ClientForm.ParseResponse(response, backwards_compat=False)
# Use to print out forms to check if website design changes
for i, x in enumerate(forms):
print 'Form[%d]: %r, %d controls' % (i, x.name, len(x.controls))
for j, c in enumerate(x.controls):
print ' ', j, c.__class__.__name__,
try: n = c.name
except AttributeError: n = 'NO NAME'
print repr(n)
#Confirm correct form
form = forms[0]
controls = form.__dict__.get('controls')
print controls, form.controls
print "------------------------------------------------------------"
try:
controls[1].value = first_name
controls[2].value = last_name
p = form.click('ctl00$cphMain$cmdButton')
print 'p is', repr(p)
page = urllib2.urlopen(p).read()
''' give error here: The following error occured: "'str' object has no attribute 'name'" '''
# print controls[9]
print '----------here-------'
soup = BeautifulSoup(''.join(page))
soup = soup.prettify()
finally:
print 'ciao!'
核心的错误修复(除了完成你可能截断的try语句,以修复语法错误)是使用
controls[1].value = first_name
controls[2].value = last_name
来替代你那段有问题的代码,因为你直接给controls[1]
和controls[2]
赋值了。你的那个错误导致在controls
列表中错误地放入了字符串,而不是实际的控件(这就是为什么你在form.click
中按名称查找失败的原因)。