使用字典将函数应用于每个键的特定子键

# imports import nltk # function to remove non-English words words = set(nltk.corpus.brown.words()) def strip_non_en(string, words): " ".join(w for w in nltk.wordpunct_tokenize(string)\ if w.lower() in words or not w.isalpha()) return string # dict example: meta_data = { '12345.xml': {'author': ['Presley'], 'date': 1956, 'doi': None, 'title': 'Heartbreak Hotel'}, '67890.xml': {'author': ['Iglesias'], 'date': 1972, 'doi': None, 'title': 'For a little bit of your love Por Un Poco De Tu Amor'} }

4条回答

网友

1楼 · 编辑于 2024-05-16 05:17:47

这是构建逻辑的一种方法。与Ajax1234类似，但我向strip_non_en添加了一个额外的可选参数

word_set = set(nltk.corpus.brown.words())

def strip_non_en(string, words=word_set, key=None):
    if key in (None, 'title'):
        string = ' '.join(w for w in nltk.wordpunct_tokenize(string) \
                          if w.lower() in words or not w.isalpha())
    return string    

new_dict = {a: strip_non_en(b, key=a) for a, b in meta_data.items()}

网友

2楼 · 编辑于 2024-05-16 05:17:47

您可以检查当前键是否为'title'，如果是，则调用函数并将当前值传递给函数：

new_dict = {a:strip_non_en(b, words) if a == 'title' else b for a, b in meta_data.items()}

另外，您可以稍微更改函数strip_non_en，以便参数words是可选的。这样，就不必每次都传递words：

def strip_non_en(string, words=words):
  " ".join(w for w in nltk.wordpunct_tokenize(string)\
  if w.lower() in words or not w.isalpha())
  return string

new_dict = {a:strip_non_en(b) if a == 'title' else b for a, b in meta_data.items()}

相关问题更多 >

编程相关推荐

热门问题

热门文章