使用python正则表达式提取干净的url

2条回答

网友

1楼 · 编辑于 2024-05-23 14:25:59

您可以使用re.findall函数将内容提取为

file  = open("/Users/shannonmcgregor/Desktop/npr.txt", 'r')
for line in file:
    if re.search('<a href=[^>]*(islamic|praying|marines|comets|dyslexics)', line):
        print re.findall(r'(?<=")[^"]*(?=")', line)[0]

将产生输出为

^{pr2}$

网友

2楼 · 编辑于 2024-05-23 14:25:59

Regex不是解析html文件的正确工具。因为你的意思，我发布了这个解决方案。在

>>> import re
>>> file  = open("/Users/shannonmcgregor/Desktop/npr.txt", 'r')
>>> for i in file:
        if re.search('<a href="[^>"]*(islamic|praying|marines|comets|dyslexics)', i):
            i = re.sub(r'^.*?<a href="([^"]*)".*', r'\1', i)
            print(i)

或

^{pr2}$

编程相关推荐

java IntelliJ找不到依赖项选项卡
java向字符串数组string[]添加元素并在Junit中测试结果
如何在eclipse中获取活动java项目的名称
如何使用java在mysql中插入时间
java ArrayList更新了插入一行，但Jtable仍然没有刷新
如何在JavaSwing中命名坐标（点）
java Matcher/模式不打印
java错误地设置了arraylist
使用UsernamePasswordCredential提供程序的java列表Azure AD
java在HTTP请求中设置UTC时间

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用python正则表达式提取干净的url

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >