Python排序问题给定列表['url'，'tag1'，'tag2'，…]和搜索规范['tag3'，'tag1'，…]，返回相关的url lis

# Given a list of saved urls each with a corresponding user-generated taglist # (ordered by relevance), the user enters a "search" list-of-tags, and is # returned a sorted list of urls. # Generate sample "content" linked-list-dictionary. The rationale is to # be able to add things like 'title' etc at later stages and to # treat each url/note as in independent entity. But a single dictionary # approach like "note['url1']=['b','a','c','d']" might work better? content = [] note = {'url':'url1', 'taglist':['b','a','c','d']} content.append(note) note = {'url':'url2', 'taglist':['c','a','b','d']} content.append(note) note = {'url':'url3', 'taglist':['a','b','c','d']} content.append(note) note = {'url':'url4', 'taglist':['a','b','d','c']} content.append(note) note = {'url':'url5', 'taglist':['d','a','c','b']} content.append(note) # An example search term of tags, ordered by importance # I'm using a dictionary with an ordinal number system # This seems clumsy search = {'d':1,'a':2,'b':3} # Create a tagCloud with one entry for each tag that occurs tagCloud = [] for note in content: for tag in note['taglist']: if tagCloud.count(tag) == 0: tagCloud.append(tag) # Create a dictionary that associates an integer value denoting # relevance (1 is most relevant etc) for each existing tag d={} for tag in tagCloud: try: d[tag]=search[tag] except KeyError: d[tag]=100 # Create a [[relevance, tag],[],[],...] result list & sort result=[] for note in content: resultNote=[] for tag in note['taglist']: resultNote.append([d[tag],tag]) resultNote.append(note['url']) result.append(resultNote) result.sort() # Remove the relevance values & recreate a list containing # the url string followed by corresponding tags. # Its so hacky i've forgotten how it works! # It's mostly for display, but suggestions on "best-practice" # intermediate-form data storage? finalResult=[] for note in result: temp=[] temp.append(note.pop()) for tag in note: temp.append(tag[1]) finalResult.append(temp) print "Content: ", content print "Search: ", search print "Final Result: ", finalResult

2条回答

网友

1楼 · 编辑于 2024-04-26 14:08:36

1) Is there a much more elegant/efficient way of doing this (embarrass me!)

当然可以。基本思想是：不要再试图告诉Python该做什么，只需要向它请求你想要的。在

content = [
    {'url':'url1', 'taglist':['b','a','c','d']},
    {'url':'url2', 'taglist':['c','a','b','d']},
    {'url':'url3', 'taglist':['a','b','c','d']},
    {'url':'url4', 'taglist':['a','b','d','c']},
    {'url':'url5', 'taglist':['d','a','c','b']}
]

search = {'d' : 1, 'a' : 2, 'b' : 3}

# We can create the tag cloud like this:
# tagCloud = set(sum((note['taglist'] for note in content), []))
# But we don't actually need it: instead, we'll just use a default value
# when looking things up in the 'search' dict.

# Create a [[relevance, tag],[],[],...] result list & sort 
result = sorted(
    [
        [search.get(tag, 100), tag]
        for tag in note['taglist']
    ] + [[note['url']]]
    # The result will look like [ [relevance, tag],... , [url] ]
    # Note that the url is wrapped in a list too. This makes the
    # last processing step easier: we just take the last element of
    # each nested list.
    for note in content
)

# Remove the relevance values & recreate a list containing
# the url string followed by corresponding tags. 
finalResult = [
    [x[-1] for x in note]
    for note in result
]

print "Content: ", content
print "Search: ", search
print "Final Result: ", finalResult

网友

2楼 · 编辑于 2024-04-26 14:08:36

我建议你也给每个标签一个权重，这取决于它的稀有程度（例如，“狼蛛”标签比“自然”标签更重）。对于一个给定的URL，与其他URL相同的稀有标记应该具有更强的相关性，而存在于另一个URL中的给定URL的常用标记应该标记相关性。在

很容易将我上面描述的规则转换为每个其他URL的数值相关性计算。在

¹除非你所有的网址都与“狼蛛”有关，否则当然：）

相关问题更多 >

编程相关推荐

热门问题

热门文章