擅长:python、mysql、java
<p>因为这些<code><br></code>都没有关闭的对应项,Beautiful Soup会自动添加它们,从而生成以下HTML:</p>
<pre><code>In [23]: soup = BeautifulSoup(html)
In [24]: soup.br
Out[24]:
<br>
Master of Science (Computer Science), Government College University Lahore
<br>
Master of Science ( Computer Science ), University of Agriculture Faisalabad
<br>
Bachelor of Science (Hons) ( Agriculture ),University of Agriculture Faisalabad
<br/></br></br></br>
</code></pre>
<p>当您在第一个<code><br></code>标记上调用<code>Tag.extract</code>时,您将删除其所有子代及其子代包含的字符串:</p>
^{pr2}$
<p>似乎您只需从<code>span</code>元素提取所有文本。如果是这样的话,不要费心移除任何东西:</p>
<pre><code>In [28]: soup.span.text
Out[28]: '\nDoctor of Philosophy ( Software Engineering ), Universiti Teknologi Petronas\n\nMaster of Science (Computer Science), Government College University Lahore\n\nMaster of Science ( Computer Science ), University of Agriculture Faisalabad\n\nBachelor of Science (Hons) ( Agriculture ),University of Agriculture Faisalabad\n'
</code></pre>
<p><code>Tag.text</code>属性从给定标记中提取所有字符串。在</p>