使用urlparse库遍历url参数

0 投票
1 回答
1438 浏览
提问于 2025-04-18 07:58

我需要一些帮助,关于我正在尝试做的事情:

我想要修改字典中每个键(在这个例子中是一个 param)的值,但每次只修改一个参数,并且这个值来自 self.fooz,每次循环都要这样做。

像这样

比如一个网址是: somesite.com?id=6&name=Bill,那么它会变成 somesite.com?id=<self.fooz>&name=Bill(针对每个单独的 fooz 进行循环),然后变成 somesite.com?id=6&name=<self.fooz> (同样是针对每个单独的 fooz 进行循环)

最后,生成一个 full_param_vectorfull_param 的值,如下所述。

有人能帮我吗?

我已经做了以下事情:

  1. 通过 self.path_object 导入了一组原始路径。
  2. ? 之后解析路径,以获取所有原始的参数化 key/values(通过 parse_after)。

我写了一些伪代码,描述我想要实现的目标:

if self.path_object is not None:
    dictpath = {}
    for path in self.path_object:
        #path.pathToScan - returns a full url e.g. somesite.com?id=6&name=Bill
        #parse_after returns a string with parameters only, like: {u'id': [u'2'], u'name': [u'Dog']}
        parse_after = urlparse.parse_qs(path.pathToScan[path.pathToScan.find('?') + 1:], keep_blank_values=0, strict_parsing=0)
        #for each params in 'parse_after':
            #replace a key's value from params with a value from self.foozs, 
            #loop over this single key inserting a single value from self.fooz for each param for all fooz_objects, then continue to the next param and do the same
            #create full_param_vector var with these new values
            #construct full_path made up of: path.pathToScan - <part before '?'> + "?" + full_param_vector
            #add all 'full_path' to a dictionary named dictpath
        #print dictpath  

任何帮助都非常欢迎。谢谢!

1 个回答

1

像这样可能会解决问题,不过我还是没太明白你的问题是什么。

from collections import defaultdict
import urllib
import urlparse

# parse the url into parts
parsed = urlparse.urlparse('http://somesite.com/blog/posting/?id=6&name=Bill')

# and parse the query string into a dictionary
qs = urlparse.parse_qs(parsed.query, keep_blank_values=0, strict_parsing=0)

# this makes a new dictionary, with same keys, but all values changed to "foobar"
foozsified = { i: 'foobar' for i in qs }

# make them back to a query string: id=foobar&name=foobar
quoted = urllib.urlencode(foozsified, doseq=True)

# the original parsed result is a named tuple and cannot be changed,
# make it into a list
parsed = list(parsed)

# replace the 4th element - the query string with our new
parsed[4] = quoted

# and unparse it into a full url    
print(urlparse.urlunparse(parsed))

这段代码会输出

http://somesite.com/blog/posting/?id=foobar&name=foobar

所以你可以在这里对qs这个字典进行任何修改,然后再用urlunparse把它变回一个完整的网址。

撰写回答