使用Regex使用python解析URL的某些部分

网友

1楼 · 编辑于 2024-05-15 23:03:15

尝试使用urlparse失败后，获得所需信息的最佳方法似乎是使用正则表达式：

import urlparse
import re

urls = [
    "http://hostname.com/as/ck$st=fa+gw+hw+ek+ei/",
    "http://hostname.com/wqs/ck$st=fasd+/",
    "http://hostname.com/as/ck$st=fa+gq+hf+kg+is&sadfnlslkdfn&gl+jh+ke+oj+kp sfav"
]

for myurl in urls:
    parsed = urlparse.urlparse(myurl)

    print 'scheme  :', parsed.scheme
    print 'netloc  :', parsed.netloc
    print 'path    :', parsed.path
    print 'params  :', parsed.params
    print 'query   :', parsed.query
    print 'fragment:', parsed.fragment
    print 'username:', parsed.username
    print 'password:', parsed.password
    print 'hostname:', parsed.hostname, '(netloc in lower case)'
    print 'port    :', parsed.port

    print urlparse.parse_qs(parsed.query)

    print re.findall(r'([\w\+]+\+[\w\+]*)(?:[^\w\+]|$)', parsed.path)
    print '-' * 80

网友

2楼 · 编辑于 2024-05-15 23:03:15

如果将[^\w\+ ]([\w\+ ]+\+[\w\+ ]+)(?:[^\w\+ ]|$)更改为[^\w\+ ]([\w\+ ]+\+[\w\+ ]*)(?:[^\w\+ ]|$)，它也将匹配第二个URL

它将包含尾随的“+”，它没有包含在您想要的输出中，但是似乎符合您提到的标准，因此如果您不想要任何尾随的“+”，这可能需要一些修改

网友

3楼 · 编辑于 2024-05-15 23:03:15

我使用regexr得出这个（regexr link）：

([\w\+]*\+[\w\+]*)(?:[^\w\+]|$)

匹配项：

fa+gw+hw+ek+ei fasd+ fa+gq+hf+kg+is gl+jh+ke+oj+kp

编辑：请尝试改用re.findall，而不是使用re.search：

>>> s = "http://hostname.com/as/ck$st=fa+gq+hf+kg+is&sadfnlslkdfn&gl+jh+ke+oj+kp sfav"
>>> re.findall("([\w\+]+\+[\w\+]*)(?:[^\w\+]|$)", s)
['fa+gq+hf+kg+is', 'gl+jh+ke+oj+kp']

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Regex使用python解析URL的某些部分

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >