<p>虽然<a href="https://stackoverflow.com/users/1044117/fferri">fferri</a>为我的原始问题提供了一些完美的解决方案,但我在数据框架上下文中的最终解决方案更像<a href="https://stackoverflow.com/users/14385310/supernoob">SuperNoob</a>的建议</p>
<p>我的最终解决方案:</p>
<pre><code># Create a parser function to form a dictionary of language: proficiency pairs from the values in the 'Speaks' column.
def parse_dictionary(content):
if pd.isna(content):
pass
else:
d = {}
lps = content.split(', ')
for lp in lps:
if '(' not in lp:
l = lp
p = pd.NA
else:
l, p = lp.split('(')
l = l.strip().capitalize()
p = p.strip('()')
d[l] = p
return d
# Create a parser function to return the languages fom the dictionary in the 'Speaks' column.
def parse_language(language, d):
if pd.isna(d):
pass
else:
if language in d.keys():
return True
else:
return False
# Create a parser function to return the language proficiencies fom the dictionary in the 'Speaks' column.
def parse_proficiency(language, d):
if pd.isna(d):
pass
else:
if language in d.keys():
return d[language]
else:
return pd.NA
# Parse the values in the 'Speaks' column to create a dictionary of language: proficiency pairs.
df['Speaks'] = df['Speaks'].map(lambda x: parse_dictionary(x))
# Parse the values in the 'Speaks' column to create seperate 'language' columns with True-False values.
for language in languages:
df['Language: {}'.format(language)] = df['Speaks'].apply(lambda d: parse_language(language, d))
# Parse the values in the 'Speaks' column to create seperate 'Language proficiency' columns with proficiency values.
for language in languages:
df['Language proficiency: {}'.format(language)] = df['Speaks'].apply(lambda d: parse_proficiency(language, d))
</code></pre>