如何获得“href”的准确实际值

2024-03-29 02:28:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我想做一个程序,可以从我的大学网站上获取我的出勤信息。为了做到这一点,我编写了一个脚本登录到网站,这将引导我进入我的仪表板,然后进入Attendence选项卡,获取href,并将其附加到大学网站的url
attendence类中的标记如下所示

<a href="../Student/StudentAttendanceView.aspx?SID=7JyKkZE1Eyx2EYNii7tOjQ==|yaE7DmfR9r8=" id="aAttandance">Attendance</a>

当我点击考勤链接时,网页上的地址栏上有一个url像这样

http://erp.college_name.edu/Student/StudentAttendanceView.aspx?SID=7JyKkZE1Eyx2EYNii7tOjQ==|yaE7DmfR9r8=。你知道吗

所以,我应该将href附加到

'http://erp.college_name.edu'。好吧,我做到了

 L = 'http://erp.college_name.edu' + str(I.findAll('li')[4].a.get('href').replace('.', ''))

但问题是,当我获取href时,它不是标签中的内容,而是不断变化的,当我获取链接时,也就是我打印L时,我得到了这个。。我以为会得到。。你知道吗

http://erp.college_name.edu/Student/StudentAttendanceViewaspx?SID=aDmK9cEFWwDqvsWw5ZzEOw==|oTeYVRfW1u8=

但问题是,我进入的href与真正的url不同,当我再次运行程序时,它会不断变化,这是我第二次进入

http://erp.college_name.edu/Student/StudentAttendanceViewaspx?SID=WM/lbVRchyyBiLsDvkORJw==|MaP8NtvvrHE=

,为什么我会得到这个,而且当我在我的仪表板页面上单击其他链接并再次单击考勤选项卡时,地址栏中url中的href值再次发生了变化。。你知道吗

所以,在那之后当我这么做的时候

opens = requests.get(L)
soup_2 = BeautifulSoup(opens.text, 'lxml')
print(L)  

我有这个。。你知道吗

    C:\Users\HUNTER\AppData\Local\Programs\Python\Python35-32\python.exe 
    C:/Users/HUNTER/PycharmProjects/dictionary/erp_1.py
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 
   "http://www.w3.org/TR/html4/strict.dtd">
  <html><head><title>The page cannot be found</title>
   <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
 <style type="text/css">
    BODY { font: 8pt/12pt verdana }
    H1 { font: 13pt/15pt verdana }
    H2 { font: 8pt/12pt verdana }
   A:link { color: red }
    A:visited { color: maroon }
 </style>
 </head><body><table border="0" cellspacing="10" width="500"><tr><td>
  <h1>The page cannot be found</h1>
  The page you are looking for might have been removed, had its name 
 changed, or is temporarily unavailable.
 <hr/>
 <p>Please try the following:</p>
 <ul>
  <li>Make sure that the Web site address displayed in the address bar of 
your browser is spelled and formatted correctly.</li>
  <li>If you reached this page by clicking a link, contact
    the Web site administrator to alert them that the link is incorrectly 
   formatted.
    </li>
    <li>Click the <a href="javascript:history.back(1)">Back</a> button to 
 try 
   another link.</li>
     </ul>
       <h2>HTTP Error 404 - File or directory not found.<br/>Internet 
    Information 
   Services (IIS)</h2>
<hr/>
 <p>Technical Information (for support personnel)</p>
 <ul>
     <li>Go to <a href="http://go.microsoft.com/fwlink/?
     linkid=8180">Microsoft 
       Product Support Services</a> and perform a title search for the words 
    <b>HTTP</b> and <b>404</b>.</li>
  <li>Open <b>IIS Help</b>, which is accessible in IIS Manager (inetmgr),
  and search for topics titled <b>Web Site Setup</b>, <b>Common 
   Administrative 
  Tasks</b>, and <b>About Custom Error Messages</b>.</li>
   </ul>
    </td></tr></table></body></html>


  Process finished with exit code 0

更新

我用[2:]替换了.replace('.', '')方法,因为replace函数也从href中的.aspx删除了.,现在问题变成了this

但是,href的值是如何不断变化的我如何获取该页。。你知道吗

有什么帮助吗?你知道吗


Tags: andthenamehttpurlforerppage