SQLalchemy: 列值组合的所有排列的分位数

0 投票
1 回答
572 浏览
提问于 2025-04-17 20:59

我们有一个 SQL Server 查询,需要为越来越多的变量生成 ntiles(分位数),这些变量会以不同的组合方式互相结合。下面是一个例子,来说明我想表达的意思:

语句 1:

ntile(10) over (partition by  MAUorALL, User_Type, fsi.Month_ID 
                    order by Objects_Created) AS Ntile_Mon_Objects_Created,

语句 2:

ntile(10) over (partition by  MAUorALL, User_Type, fsi.Month_ID, *Country*
          order by Objects_Created) AS Ntile_Country_Objects_Created

语句 3:

ntile(10) over (partition by  MAUorALL, User_Type, fsi.Month_ID, *User*_Type
                 order by Objects_Created) AS Ntile_UT_Objects_Created

你可以看到,这些语句基本上是一样的,除了第二和第三个语句中,斜体的“country”(国家)和“user type”(用户类型)这两列是新加的。所以我们对同一个变量“Objects_Created”(创建的对象)在不同的细节层次上计算 ntiles,同时也需要对这些变量的各种可能组合进行 ntiles 的计算,比如:

语句 4:

ntile(10) over (partition by  MAUorALL, User_Type, fsi.Month_ID, *Country, User_Type*
            order by Objects_Created) AS Ntile_Country_UT_Objects_Created

我们可以手动编写这些组合的代码,但如果能用 sqlalchemy 来执行这些变量的所有组合,可能会让事情变得简单一些。有没有人能提供一个我可以改用的例子?

谢谢你的帮助!

1 个回答

0

我不知道 fsi 和其他列有什么关系,但假设所有数据都在一个模型里(用 sqlalchemy 查询很容易扩展),就像下面这样:

class User(Base):
    __tablename__ = 't_users'
    id = Column(Integer, primary_key=True)
    MAUorALL = Column(String)
    User_Type = Column(String)
    Country = Column(String)
    Month_ID = Column(Integer)
    Objects_Created = Column(Integer)

这个任务可以通过简单使用 itertools.permutations(或者 itertools.combinations,具体取决于你想要实现什么)来完成查询。下面的代码会为 User 表生成一个包含各种 ntiles 的查询。我想看代码就能理解发生了什么:

# configuration: {label: Column}
column_labels = {
        'Country': User.Country,
        'UT': User.User_Type,
        }

def get_ntile(additional_columns=None):
    """ @return: sqlalchemy expression for selecting a given ntile() using
    predefined as well as *additional* columns.
    """
    partition_by = [
        User.MAUorALL,
        User.User_Type,
        User.Month_ID,
        ]
    label = "Ntile_Objects_Created"
    if additional_columns:
        lbls = []
        for col_name in additional_columns:
            col = column_labels[col_name]
            partition_by.append(col)
            lbls.append(col_name)
        label = "Ntile_{}_Objects_Created".format("_".join(lbls))
    xprs = over(
            func.ntile(10),
            partition_by = partition_by,
            order_by = User.Objects_Created,
            ).label(label)
    return xprs

def get_query(additional_columns=['UT', 'Country']):
    """ @return: a query object which selects a User with additional ntiles
    for predefined columns (fixed) and all possible permutations of
    *additional_columns*
    """
    from itertools import permutations#, combinations
    tiles = [get_ntile(comb)
            for r in range(len(additional_columns) + 1)
            for comb in permutations(additional_columns, r)
            ]
    q = session.query(User, *tiles)
    return q

q = get_query()
print [_c["name"] for _c in q.column_descriptions]
# >>> ['User', 'Ntile_Objects_Created', 'Ntile_UT_Objects_Created', 'Ntile_Country_Objects_Created', 'Ntile_UT_Country_Objects_Created', 'Ntile_Country_UT_Objects_Created']

for tile in q.all():
    print tile

撰写回答