我有一个这样的数据帧(创建一个示例数据帧)
df = pd.DataFrame({'language': ['ruby','ruby','ruby', np.nan,'ruby'],
'top_lang_owned': [['ruby', 'javascript', 'go'],
['ruby', 'coffeescript'],
['javascript', 'coffeescript'],
['ruby', 'shell', 'go'],
np.nan],
'top_lang_watched': [['ruby','go'],
['javascript'],
np.NaN,
['ruby', 'shell'],
np.nan]})
df
language top_lang_owned top_lang_watched 0 ruby [ruby, javascript, go] [ruby, go] 1 ruby [ruby, coffeescript] [javascript] 2 ruby [javascript, coffeescript] NaN 3 NaN [ruby, shell, go] [ruby, shell] 4 ruby NaN NaN
dataframe.info();
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 2 columns): language 4 non-null object top_lang_owned 4 non-null object dtypes: object(2) memory usage: 208.0+ bytes
我想添加一个字段来比较两个字段的值。(伪代码)
if ("language" is in "top_lang_owned")
then new_field = 1 othervise new_field = 0.
例如,所需的输出必须低于
language top_lang_owned top_lang_watched is_owned is_watched 0 ruby [ruby, javascript, go] [ruby, go] 1 1 1 ruby [ruby, coffeescript] [javascript] 1 0 2 ruby [javascript, coffeescript] NaN 0 0 3 NaN [ruby, shell, go] [ruby, shell] NaN NaN 4 ruby NaN NaN NaN NaN
你当然可以这样做, 下面是您可能想要尝试的代码
编辑:
输出:
您可以过滤NA并应用条件:
相关问题 更多 >
编程相关推荐