擅长:python、mysql、java
<p>Regex并不是解析HTML或XML数据的最佳工具,而BeautifulSoup在那里效率更高、更简单。你可以做:</p>
<pre><code>from bs4 import BeautifulSoup
...
soup = BeautifulSoup(page.decode(), 'html.parser')
files = [ i.get("src") for i in soup.findAll('img') ] # get the src attribute for all img tags
files.sort()
print (f'\n [+] {len(files)} IMAGES FOUND:\n')
for file in files:
print(file)
</code></pre>
<p>这样,HTML被有效地解析,只返回真正的标记</p>