基于词典理解的结构化文本平面词典问题的回答

基于词典理解的结构化文本平面词典

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正试图从一段结构化文本中创建一本词典，但我无法理解正确的语法 <pre><code>text = 'english (fluently), spanish (poorly)' # desired output: {english: fluently, spanish: poorly} # one of my many attempts: dict((language,proficiency.strip('\(\)')) for language,proficiency in lp.split(' ') for lp in text.split(', ')) # but resulting error: NameError: name 'lp' is not defined </code></pre> 我猜lp.split（“”）中的lp没有定义，但我不知道如何修改语法以获得所需的结果 事实上，情况更为复杂。我有一个dataframe，我的目标是最终使用一个函数将上述数据整理成每种语言的列和每种相应语言的列。如下所示（尽管可能可以更有效地完成） <pre><code># pandas dataframe pd.DataFrame({'language': ['english, spanish (poorly)', 'turkish']}) # desired output: pd.DataFrame({'Language: English': [True, False], 'Language proficiency: English': ['average', pd.NA], 'Language: Spanish': [True, False], 'Language proficiency: Spanish': ['poorly', pd.NA], 'Language: Turkish': [False, True], 'Language proficiency: Turkish': [pd.NA, 'average']}) # my attempt def tidy(content): if pd.isna(content): pass else: dict((language,proficiency.strip('\(\)')) for language,proficiency in lp.split(' ') for lp in text.split(', ')) def tidy_language(language, content): if pd.isna(content): return pd.NA else: if language in content.keys(): return True else: return False def tidy_proficiency(language, content): if pd.isna(content): return pd.NA else: if language in content.keys(): return content.language else: return pd.NA languages = ['english', 'spanish', 'turkish'] df['language'] = df['language'].map(lambda x: tidy(x)) for language in languages: df['Language: {}'.format(language.capitalize())] = df['language'].map(lambda x: tidy_language(language, content) df['Language proficiency: {}'.format(language.capitalize())] = df['language'].map(lambda x: tidy_proficiency(language, content) </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

虽然<a href="https://stackoverflow.com/users/1044117/fferri">fferri</a>为我的原始问题提供了一些完美的解决方案，但我在数据框架上下文中的最终解决方案更像<a href="https://stackoverflow.com/users/14385310/supernoob">SuperNoob</a>的建议 我的最终解决方案： <pre><code># Create a parser function to form a dictionary of language: proficiency pairs from the values in the 'Speaks' column. def parse_dictionary(content): if pd.isna(content): pass else: d = {} lps = content.split(', ') for lp in lps: if '(' not in lp: l = lp p = pd.NA else: l, p = lp.split('(') l = l.strip().capitalize() p = p.strip('()') d[l] = p return d # Create a parser function to return the languages fom the dictionary in the 'Speaks' column. def parse_language(language, d): if pd.isna(d): pass else: if language in d.keys(): return True else: return False # Create a parser function to return the language proficiencies fom the dictionary in the 'Speaks' column. def parse_proficiency(language, d): if pd.isna(d): pass else: if language in d.keys(): return d[language] else: return pd.NA # Parse the values in the 'Speaks' column to create a dictionary of language: proficiency pairs. df['Speaks'] = df['Speaks'].map(lambda x: parse_dictionary(x)) # Parse the values in the 'Speaks' column to create seperate 'language' columns with True-False values. for language in languages: df['Language: {}'.format(language)] = df['Speaks'].apply(lambda d: parse_language(language, d)) # Parse the values in the 'Speaks' column to create seperate 'Language proficiency' columns with proficiency values. for language in languages: df['Language proficiency: {}'.format(language)] = df['Speaks'].apply(lambda d: parse_proficiency(language, d)) </code></pre>

基于词典理解的结构化文本平面词典

1 个回答

相关Python问题