<p>最新答案:</p>
<pre><code>
from bs4 import BeautifulSoup
html_str = """<div class="page"><p />
<p></p>
<p>First line required
</p>
<p>Second line required
</p>
<p>Third line required
</p>
<p>Line 1 not required
</p>
<p>Line 2 not required
</p>
<p></p>
</div>
<div class="page"><p />
<p>line required 1
</p>
<p></p>
<p>line required 2
</p>
<p>line required 3
</p>
<p></p>
<p>line required 4
</p>
<p>line required 5
</p>
<p>line required 6
</p>
<p>Line 1 not required
</p>
<p>Line 2 not required
<p />
</div>"""
#Load the html string into bs4 object
soup = BeautifulSoup(html_str, 'lxml')
#Strip off empty tags. This also removes empty <p> tags
[x.decompose() for x in soup.findAll(lambda tag: not tag.contents and not tag.name == 'br' )]
#Load all divs with classname = 'page'
items = soup.find_all('',{'class':'page'})
final_html=''
#This for loop removes the last 2 tags from every div (as requested)
for item in items:
last_item = str(item.find_all('p')[-1])
second_last_item = str(item.find_all('p')[-2])
current_item = str(item)
current_item = current_item.replace(last_item,'')
current_item = current_item.replace(second_last_item,'')
final_html = final_html + current_item
final_soup = BeautifulSoup(final_html)
final_str = final_soup.text
print(final_str)
</code></pre>
<p><strong>输出:</strong></p>
<pre><code>print(final_str)
First line required
Second line required
Third line required
line required 1
line required 2
line required 3
line required 4
line required 5
line required 6
</code></pre>