正则表达式来匹配除被注释的图像URL之外的任何图像URL

2024-04-16 07:52:21 发布

您现在位置:Python中文网/ 问答频道 /正文

This是我为Python准备的正则表达式:

^(?<!(<!--.))(http(s?):)?([\/|\.|\w|\s|-])*\.(?:jpg|gif|png)$

当前表达式与此匹配:

/images/lol/hallo.png

但我需要它来匹配这个图片网址:

/images/lol/hallo.png

以及没有周围标记的图像url:

<img src="/images/lol/hallo.png" />

但不是那些被注释掉的:

<!-- /images/lol/hallo.png -->
<!-- <img src="/images/lol/hallo.png" /> -->

Tags: 标记图像srchttpimgpng表达式图片
1条回答
网友
1楼 · 发布于 2024-04-16 07:52:21

这应该起作用:

<! [\s\S]*? >|(?P<url>(http(s?):)?\/?\/?[^,;" \n\t>]+?\.(jpg|gif|png))

测试字符串:

<img src="/images/lol/hallo.png" />
    /images/lol/hallo.png
    /images/lol/hallo.png
    //example.com/images/lol/hallo.png
    http://example.com/images/lol/hallo.png
    https://example.com/images/lol/hallo.png
    <!  /images/lol/commented.png  >
    <!  <img src="/images/lol/commented2.png" />  >
    images/ui/paper-icon-1.png


/images/lol/hallo.png and more here /images/lol/hallo.png

Python代码:

import re

x = '''
    <img src="/images/lol/hallo.png" />
    /images/lol/hallo.png
    /images/lol/hallo.png
    //example.com/images/lol/hallo.png
    http://example.com/images/lol/hallo.png
    https://example.com/images/lol/hallo.png
    <!  /images/lol/commented.png  >
    <!  <img src="/images/lol/commented2.png" />  >
    images/ui/paper-icon-1.png


/images/lol/hallo.png and more here /images/lol/hallo.png
'''
regexp = r'<! [\s\S]*? >|(?P<url>(http(s?):)?\/?\/?[^,;" \n\t>]+?\.(jpg|gif|png))'
result = [item[0] for item in re.findall(regexp, x) if item[0]]
for item in result:
    print(item)

演示:https://regex101.com/r/YmXo2Q/4

相关问题 更多 >