我想做一个程序,可以从我的大学网站上获取我的出勤信息。为了做到这一点,我编写了一个脚本登录到网站,这将引导我进入我的仪表板,然后进入Attendence
选项卡,获取href
,并将其附加到大学网站的url
,attendence
类中的标记如下所示
<a href="../Student/StudentAttendanceView.aspx?SID=7JyKkZE1Eyx2EYNii7tOjQ==|yaE7DmfR9r8=" id="aAttandance">Attendance</a>
当我点击考勤链接时,网页上的地址栏上有一个url
像这样
http://erp.college_name.edu/Student/StudentAttendanceView.aspx?SID=7JyKkZE1Eyx2EYNii7tOjQ==|yaE7DmfR9r8=
。你知道吗
所以,我应该将href
附加到
'http://erp.college_name.edu'
。好吧,我做到了
L = 'http://erp.college_name.edu' + str(I.findAll('li')[4].a.get('href').replace('.', ''))
但问题是,当我获取href
时,它不是标签中的内容,而是不断变化的,当我获取链接时,也就是我打印L
时,我得到了这个。。我以为会得到。。你知道吗
http://erp.college_name.edu/Student/StudentAttendanceViewaspx?SID=aDmK9cEFWwDqvsWw5ZzEOw==|oTeYVRfW1u8=
但问题是,我进入的href
与真正的url
不同,当我再次运行程序时,它会不断变化,这是我第二次进入
http://erp.college_name.edu/Student/StudentAttendanceViewaspx?SID=WM/lbVRchyyBiLsDvkORJw==|MaP8NtvvrHE=
,为什么我会得到这个,而且当我在我的仪表板页面上单击其他链接并再次单击考勤选项卡时,地址栏中url
中的href
值再次发生了变化。。你知道吗
所以,在那之后当我这么做的时候
opens = requests.get(L)
soup_2 = BeautifulSoup(opens.text, 'lxml')
print(L)
我有这个。。你知道吗
C:\Users\HUNTER\AppData\Local\Programs\Python\Python35-32\python.exe
C:/Users/HUNTER/PycharmProjects/dictionary/erp_1.py
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html><head><title>The page cannot be found</title>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<style type="text/css">
BODY { font: 8pt/12pt verdana }
H1 { font: 13pt/15pt verdana }
H2 { font: 8pt/12pt verdana }
A:link { color: red }
A:visited { color: maroon }
</style>
</head><body><table border="0" cellspacing="10" width="500"><tr><td>
<h1>The page cannot be found</h1>
The page you are looking for might have been removed, had its name
changed, or is temporarily unavailable.
<hr/>
<p>Please try the following:</p>
<ul>
<li>Make sure that the Web site address displayed in the address bar of
your browser is spelled and formatted correctly.</li>
<li>If you reached this page by clicking a link, contact
the Web site administrator to alert them that the link is incorrectly
formatted.
</li>
<li>Click the <a href="javascript:history.back(1)">Back</a> button to
try
another link.</li>
</ul>
<h2>HTTP Error 404 - File or directory not found.<br/>Internet
Information
Services (IIS)</h2>
<hr/>
<p>Technical Information (for support personnel)</p>
<ul>
<li>Go to <a href="http://go.microsoft.com/fwlink/?
linkid=8180">Microsoft
Product Support Services</a> and perform a title search for the words
<b>HTTP</b> and <b>404</b>.</li>
<li>Open <b>IIS Help</b>, which is accessible in IIS Manager (inetmgr),
and search for topics titled <b>Web Site Setup</b>, <b>Common
Administrative
Tasks</b>, and <b>About Custom Error Messages</b>.</li>
</ul>
</td></tr></table></body></html>
Process finished with exit code 0
更新
我用[2:]
替换了.replace('.', '')
方法,因为replace函数也从href
中的.aspx
删除了.
,现在问题变成了this
但是,href
的值是如何不断变化的我如何获取该页。。你知道吗
有什么帮助吗?你知道吗
目前没有回答
相关问题 更多 >
编程相关推荐