用于numpy的sql分区和窗口函数
numpy-partition的Python项目详细描述
将numpy数组分成一列或多列分区,并对每个分区应用窗口函数
此模块尝试复制select window_function() over (partition by ... order by ...) ...
功能,通常在SQL数据库中找到
以下窗口函数是现成的:row_number()
,top()
,avg()
。
用法示例:
>>> from partition import apply_over_partition
>>> from partition.window import row_number, top, avg
>>> data = np.array([[1,1,3], [2,2,3], [1,1,4]], dtype=np.float32)
>>> partition_by_col_indexes = (0, 1)
>>> value_col_indexes = (2,)
>>> value_ordering = (-1,) # descending order
>>> f = avg
>>> f_kwargs = dict(vcol=2, top_n=2)
>>> apply_over_partition(data=data, partition_by_col_indexes=partition_by_col_indexes, value_col_indexes=value_col_indexes, value_ordering=value_ordering, f=f, f_kwargs=f_kwargs)
array([3.5, 3. , 3.5])
>>> f = avg
>>> f_kwargs = dict(vcol=2, top_n=1)
>>> apply_over_partition(data=data, partition_by_col_indexes=partition_by_col_indexes, value_col_indexes=value_col_indexes, value_ordering=value_ordering, f=f, f_kwargs=f_kwargs)
array([4., 3., 4.])
>>> f = row_number
>>> f_kwargs = dict()
>>> apply_over_partition(data=data, partition_by_col_indexes=partition_by_col_indexes, value_col_indexes=value_col_indexes, value_ordering=value_ordering, f=f, f_kwargs=f_kwargs)
array([1, 0, 0])
>>> f = top
>>> f_kwargs = dict(n=1)
>>> apply_over_partition(data=data, partition_by_col_indexes=partition_by_col_indexes, value_col_indexes=value_col_indexes, value_ordering=value_ordering, f=f, f_kwargs=f_kwargs)
array([False, True, True])