从patsy的DesignMatrix中获取名称

2 投票

1 回答

1816 浏览

提问于 2025-04-18 05:57

问题：我想问的是，能不能不通过Designinfo来指定列的“名字”，因为这样会让我的代码变得不太灵活？我能不能直接读取DesignMatrix给出的名字，这样我就可以把它们放进DataFrame里，而不需要提前知道“参考水平/对照组”的具体内容？

也就是说，当我执行这段代码时：

from patsy import *
from pandas import *
dta =  DataFrame([["lo", 1],["hi", 2.4],["lo", 1.2],["lo", 1.4],["very_high",1.8]], columns=["carbs", "score"])
dmatrix("carbs + score", dta)
DesignMatrix with shape (5, 4)
Intercept  carbs[T.lo]  carbs[T.very_high]  score
        1            1                   0    1.0
        1            0                   0    2.4
        1            1                   0    1.2
        1            1                   0    1.4
        1            0                   1    1.8
Terms:
'Intercept' (column 0), 'carbs' (columns 1:3), 'score' (column 3)

那么g就是我可以用来做逻辑建模的转换后的数据框，这样我就不需要记住（或者硬编码）列名和它们的参考水平了。

"""
# How can I get something like this with dmatrix's output without hardcoding ?
names = obtained from dmatrix's output above 
This should give names = ['Intercept' ,'carbs[T.lo]', 'carbs[T.very_high]', 'score']
"""
g=DataFrame(dmatrix("carbs + score", dta),columns=names)

    Intercept  carbs[T.lo]  carbs[T.very_high]  score
   0  1  2    3
0  1  1  0  1.0
1  1  0  0  2.4
2  1  1  0  1.2
3  1  1  0  1.4
4  1  0  1  1.8

type(g)=<class 'pandas.core.frame.DataFrame'>

数据处理数据框列名 patsy designmatrix 逻辑建模参考水平

1 个回答

我觉得你想要的信息在 design_info.column_names 里：

>>> dm = dmatrix("carbs + score", dta)
>>> dm.design_info
DesignInfo(['Intercept', 'carbs[T.lo]', 'carbs[T.very_high]', 'score'],
           term_slices=OrderedDict([(Term([]), slice(0, 1, None)), (Term([EvalFactor('carbs')]), slice(1, 3, None)), (Term([EvalFactor('score')]), slice(3, 4, None))]),
           builder=<patsy.build.DesignMatrixBuilder at 0xb03f8cc>)
>>> dm.design_info.column_names
['Intercept', 'carbs[T.lo]', 'carbs[T.very_high]', 'score']

还有其他的内容

>>> DataFrame(dm, columns=dm.design_info.column_names)
   Intercept  carbs[T.lo]  carbs[T.very_high]  score
0          1            1                   0    1.0
1          1            0                   0    2.4
2          1            1                   0    1.2
3          1            1                   0    1.4
4          1            0                   1    1.8

[5 rows x 4 columns]

回答于 2025-04-18 由 Python大师

分享举报

从patsy的DesignMatrix中获取名称

1 个回答

撰写回答