带附加参数的自定义聚合原语?

2024-05-01 21:48:33 发布

您现在位置:Python中文网/ 问答频道 /正文

transform原语可以很好地使用其他参数。这里有一个例子

def string_count(column, string=None):
    '''
    ..note:: this is a naive implementation used for clarity
    '''
    assert string is not None, "string to count needs to be defined"
    counts = [str(element).lower().count(string) for element in column]
    return counts


def string_count_generate_name(self):
    return u"STRING_COUNT(%s, %s)" % (self.base_features[0].get_name(),
                                      '"' + str(self.kwargs['string'] + '"'))


StringCount = make_trans_primitive(
    function=string_count,
    input_types=[Categorical],
    return_type=Numeric,
    cls_attributes={
        "generate_name": string_count_generate_name
    })

es = ft.demo.load_mock_customer(return_entityset=True)
count_the_feat = StringCount(es['transactions']['product_id'], string="5")
fm, fd = ft.dfs(
    entityset=es,
    target_entity='transactions',
    max_depth=1,
    features_only=False,
    seed_features=[count_the_feat])

输出:

^{pr2}$

但是,如果我像这样修改并使其成为聚合原语:

def string_count(column, string=None):
    '''
    ..note:: this is a naive implementation used for clarity
    '''
    assert string is not None, "string to count needs to be defined"
    counts = [str(element).lower().count(string) for element in column]
    return sum(counts)


def string_count_generate_name(self):
    return u"STRING_COUNT(%s, %s)" % (self.base_features[0].get_name(),
                                      '"' + str(self.kwargs['string'] + '"'))


StringCount = make_agg_primitive(
    function=string_count,
    input_types=[Categorical],
    return_type=Numeric,
    cls_attributes={
        "generate_name": string_count_generate_name
    })

es = ft.demo.load_mock_customer(return_entityset=True)
count_the_feat = StringCount(es['transactions']['product_id'], string="5")

我得到以下错误:

TypeError: new_class_init() missing 1 required positional argument: 'parent_entity'

featuretools是否支持带有附加参数的自定义聚合原语?在


Tags: tonameselfnoneforstringreturnes
1条回答
网友
1楼 · 发布于 2024-05-01 21:48:33

这里的问题是缺少seed特性的参数。对于聚合原语,需要指定要在其上聚合的实体。在本例中,将聚合种子特性的构造更改为

count_the_feat = StringCount(es['transactions']['product_id'], es['sessions'], string="5")

将创建功能

^{pr2}$

一如预期。该特性将给出每个会话id出现字符串“5”的频率

相关问题 更多 >