使用函数将值应用于Dask数据帧映射

2024-04-28 11:48:50 发布

您现在位置:Python中文网/ 问答频道 /正文

在下面的Dask代码中,我试图根据函数中的逻辑设置dataframe字段的值,apply_masks

import numpy as np
import pandas as pd
import dask.dataframe as daskDataFrame

def apply_masks(df):
   if df['Age'] > 14:
       df['outcol'] = 6
   else:
       df['outcol'] = 5
   return df

data = [[1,100, 12, 6], [1,200, 18, 5], [1,170, 22, 4]]
df = pd.DataFrame(data, columns = ['outcol', 'Weight', 'Age', 'Height']) 
ddf = daskDataFrame.from_pandas(df, npartitions=100)
ddf = ddf.map_partitions(apply_masks)
print(ddf.compute())

问题是获取异常:

ValueError: Metadata inference failed in apply_masks.

You have supplied a custom function and Dask is unable to determine the type of output that that function returns.

To resolve this please provide a meta= keyword. The docstring of the Dask function you ran should have more information.

Original error is below: ------------------------ ValueError('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().')

如何解决这个问题


Tags: ofimportdataframepandasdfisasfunction
1条回答
网友
1楼 · 发布于 2024-04-28 11:48:50

试试^{}+^{}

def apply_masks(df):
    return df.assign(outcol=np.where(df['Age'] > 14, 6, 5))

结果:

   outcol  Weight  Age  Height
0       5     100   12       6
1       6     200   18       5
2       6     170   22       4

相关问题 更多 >