使用Regex显示网站图像的问题

2条回答

网友

1楼 · 编辑于 2024-05-15 15:59:42

Regex并不是解析HTML或XML数据的最佳工具，而BeautifulSoup在那里效率更高、更简单。你可以做：

from bs4 import BeautifulSoup

...    
soup = BeautifulSoup(page.decode(), 'html.parser')
files = [ i.get("src") for i in soup.findAll('img') ]  # get the src attribute for all img tags
files.sort()
print (f'\n [+] {len(files)} IMAGES FOUND:\n')
for file in files:
    print(file)

这样，HTML被有效地解析，只返回真正的标记

网友

2楼 · 编辑于 2024-05-15 15:59:42

您可以直接提取图像src

>>> images = ['<img src="demo.jpg" height=12>', '<img src="demo2.jpg" height=500>']
>>> for image in images:
        print(re.search(r'<img[^>]*src="([^"]*)"', image).group(1))

demo.jpg
demo2.jpg

如果您的输入都是string，您可以使用findall，然后对其进行迭代

>>> images = '''<img src="demo.jog" height=12> <img src="demo.jog" height=500>'''
>>> res = re.findall(r'<img[^>]*src="([^"]*)"', images)
>>> for img in res:
        print(img)
demo.jpg
demo2.jpg

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Regex显示网站图像的问题

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >