Python for循环用于保存包含特定值的键和值

2024-04-27 11:12:15 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个python列表&;字典结构如下:

[ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
  {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
  {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
  {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]

我正在努力找到最有效的方法来

(i)仅循环遍历='href'的键,以及仅循环值包含'https://www.simplyrecipes.com/recipes/''href'键,并标识包含'recipes/cuisine''recipes/season''recipes/ingredient'
的值('http...') (ii)将每个完整url值保存到单独的列表中(取决于它们满足的'recipe/...'条件),并命名为适当的

预期结果:

cuisine = ['https://www.simplyrecipes.com/recipes/cuisine/portuguese/']
season = ['https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/']
type = ['https://www.simplyrecipes.com/recipes/type/condiment/']
ingredient = ['https://www.simplyrecipes.com/recipes/ingredient/adobado/']

跳过任何不符合上述条件的键和/或值。

任何指点都将不胜感激


Tags: httpscom列表wwwtypeseasonhrefspring
3条回答

所以大致上

from itertools import groupby
import re

lst = [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
  {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
  {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
  {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]

def f(i):
    x = re.findall("https://www.simplyrecipes.com/recipes/([^/ ]+)/(?:[^/ ]+/?)+", i["href"]) 
    return x and x[0] or None

r = filter(lambda i: i[0] in ('cuisine', 'season', 'ingredient'), groupby(lst, f))
for i in r:
    print(f"{i[0]} = {list(map(lambda j: j['href'], i[1]))}")

# result:
# cuisine = ['https://www.simplyrecipes.com/recipes/cuisine/portuguese/']
# season = ['https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/']
# ingredient = ['https://www.simplyrecipes.com/recipes/ingredient/adobado/']

假设URL与所附问题中的格式相同。更好的方法是创建一个不同食谱的目录

In [50]: from collections import defaultdict

In [51]: sep_data = defaultdict(list)

In [52]: lst = [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]

In [59]: for i in lst: sep_data[i["href"].split("/")[-3]].append(i["href"])

In [60]: sep_data
Out[60]:
defaultdict(list,
            {'cuisine': ['https://www.simplyrecipes.com/recipes/cuisine/portuguese/'],
             'season': ['https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'],
             'type': ['https://www.simplyrecipes.com/recipes/type/condiment/'],
             'ingredient': ['https://www.simplyrecipes.com/recipes/ingredient/adobado/']})

这里有一个简单的例子,希望对您有所帮助

import re

trash = [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
          {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
          {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
          {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]

for x in trash:
    for y in x.values():
        txt = ''
        for i in re.findall("recipes/.*", y):
            txt += i
            title = txt.split('/')[1]
            print({title: y})

输出

{'cuisine': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'}
{'season': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'}
{'type': 'https://www.simplyrecipes.com/recipes/type/condiment/'}
{'ingredient': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}

相关问题 更多 >