Python 最短距离算法
我想创建一个简单的广度优先搜索算法,用来找出最短路径。
这里有一个演员信息字典,它把演员和他们出演的电影列表对应起来:
actor_info = { "act1" : ["movieC", "movieA"], "act2" : ["movieA", "movieB"],
"act3" :["movieA", "movieB"], "act4" : ["movieC", "movieD"],
"act5" : ["movieD", "movieB"], "act6" : ["movieE"],
"act7" : ["movieG", "movieE"], "act8" : ["movieD", "movieF"],
"KevinBacon" : ["movieF"], "act10" : ["movieG"], "act11" : ["movieG"] }
而这个字典的反向关系则是把电影和出演这些电影的演员列表对应起来:
movie_info = {'movieB': ['act2', 'act3', 'act5'], 'movieC': ['act1', 'act4'],
'movieA': ['act1', 'act2', 'act3'], 'movieF': ['KevinBacon', 'act8'],
'movieG': ['act7', 'act10', 'act11'], 'movieD': ['act8', 'act4', 'act5'],
'movieE': ['act6', 'act7']}
所以当我调用
shortest_dictance("act1", "Kevin Bacon", actor_info, movie_info)
时,我应该得到 3
,因为 act1
在 movieC
中和 Act4
一起出现,而 Act4
又在 movieD
中和 Act8
一起出现,接着 Act8
在 movie F
中和 KevinBacon
一起出现。所以最短的距离是 3。
到目前为止,我有这个:
def shotest_distance(actA, actB, actor_info, movie_info):
'''Return the number of movies required to connect actA and actB.
If theres no connection return -1.'''
# So we keep 2 lists of actors:
# 1.The actors that we have already investigated.
# 2.The actors that need to be investigated because we have found a
# connection beginning at actA. This list must be
# ordered, since we want to investigate actors in the order we
# discover them.
# -- Each time we put an actor in this list, we also store
# her distance from actA.
investigated = []
to_investigate = [actA]
distance = 0
while actB not in to_investigate and to_investigate!= []:
for actor in to_investigate:
to_investigated.remove(actA)
investigated.append(act)
for movie in actor_info[actor]:
for co_star in movie_info[movie]:
if co_star not in (investigated and to_investigate):
to_investigate.append(co_star)
....
....
return d
我还搞不清楚在每次代码迭代中,如何合适地记录发现的距离。而且代码在时间效率上似乎也很低。
2 个回答
1
这看起来是有效的。它会跟踪当前的一组电影。在每一步,它会查看所有距离一步的电影,这些电影还没有被考虑过(也就是“看过”的电影)。
actor_info = { "act1" : ["movieC", "movieA"], "act2" : ["movieA", "movieB"],
"act3" :["movieA", "movieB"], "act4" : ["movieC", "movieD"],
"act5" : ["movieD", "movieB"], "act6" : ["movieE"],
"act7" : ["movieG", "movieE"], "act8" : ["movieD", "movieF"],
"KevinBacon" : ["movieF"], "act10" : ["movieG"], "act11" : ["movieG"] }
movie_info = {'movieB': ['act2', 'act3', 'act5'], 'movieC': ['act1', 'act4'],
'movieA': ['act1', 'act2', 'act3'], 'movieF': ['KevinBacon', 'act8'],
'movieG': ['act7', 'act10', 'act11'], 'movieD': ['act8', 'act4', 'act5'],
'movieE': ['act6', 'act7']}
def shortest_distance(actA, actB, actor_info, movie_info):
if actA not in actor_info:
return -1 # "infinity"
if actB not in actor_info:
return -1 # "infinity"
if actA == actB:
return 0
dist = 1
movies = set(actor_info[actA])
end_movies = set(actor_info[actB])
if movies & end_movies:
return dist
seen = movies.copy()
print "All movies with", actA, seen
while 1:
dist += 1
next_step = set()
for movie in movies:
for actor in movie_info[movie]:
next_step.update(actor_info[actor])
print "Movies with actors from those movies", next_step
movies = next_step - seen
print "New movies with actors from those movies", movies
if not movies:
return -1 # "Infinity"
# Has actorB been in any of those movies?
if movies & end_movies:
return dist
# Update the set of seen movies, so I don't visit them again
seen.update(movies)
if __name__ == "__main__":
print shortest_distance("act1", "KevinBacon", actor_info, movie_info)
输出结果是
All movies with act1 set(['movieC', 'movieA'])
Movies with actors from those movies set(['movieB', 'movieC', 'movieA', 'movieD'])
New movies with actors from those movies set(['movieB', 'movieD'])
Movies with actors from those movies set(['movieB', 'movieC', 'movieA', 'movieF', 'movieD'])
New movies with actors from those movies set(['movieF'])
3
这里有一个版本,它返回一个电影列表,表示最小连接(如果没有连接则返回None,如果actA和actB是同一部电影则返回一个空列表)。
def connect(links, movie):
chain = []
while movie is not None:
chain.append(movie)
movie = links[movie]
return chain
def shortest_distance(actA, actB, actor_info, movie_info):
if actA not in actor_info:
return None # "infinity"
if actB not in actor_info:
return None # "infinity"
if actA == actB:
return []
# {x: y} means that x is one link outwards from y
links = {}
# Start from the destination and work backward
for movie in actor_info[actB]:
links[movie] = None
dist = 1
movies = links.keys()
while 1:
new_movies = []
for movie in movies:
for actor in movie_info[movie]:
if actor == actA:
return connect(links, movie)
for other_movie in actor_info[actor]:
if other_movie not in links:
links[other_movie] = movie
new_movies.append(other_movie)
if not new_movies:
return None # Infinity
movies = new_movies
if __name__ == "__main__":
dist = shortest_distance("act1", "KevinBacon", actor_info, movie_info)
if dist is None:
print "Not connected"
else:
print "The Kevin Bacon Number for act1 is", len(dist)
print "Movies are:", ", ".join(dist)
这是输出结果:
The Kevin Bacon Number for act1 is 3
Movies are: movieC, movieD, movieF
2
首先,把这些节点连接起来,创建一个图形,然后运行最短路径的代码(当然,也可能有更高效的图形库可以用来完成这个任务,不过下面提到的这个方法也很优雅)。接着,从最短路径中找出所有的电影名称。
for i in movie_info:
actor_info[i] = movie_info[i]
def find_shortest_path(graph, start, end, path=[]):
path = path + [start]
if start == end:
return path
if not start in graph:
return None
shortest = None
for node in graph[start]:
if node not in path:
newpath = find_shortest_path(graph, node, end, path)
if newpath:
if not shortest or len(newpath) < len(shortest):
shortest = newpath
return shortest
L = find_shortest_path(actor_info, 'act1', 'act2')
print len([i for i in L if i in movie_info])
find_shortest_path 来源: http://www.python.org/doc/essays/graphs/