为数据科学家提供方便的访问准备好的数据的工具。

odus的Python项目详细描述


# %load_ext autoreload# %autoreload 2

简介

ODU(老年吸毒者研究)包含研究老年吸毒者吸毒情况的数据和工具。在

从本质上讲,这些是工具:

  • 获取119个不同受访者的119个“轨迹”数据,描述了119个不同受访者的31个变量(吸毒、社交等)。在

  • 以各种方式将这些轨迹形象化

  • 创建这些轨迹和变量的任意选择的PDF

  • 为变量的任何组合制作计数表:任何马尔可夫或贝叶斯分析的基本步骤。在

  • 根据变量的任何组合制作概率表(联合表或条件表)

  • 对这些计数表和概率表进行运算,从而使推理运算成为可能

安装

你需要有python3.7+才能运行这个笔记本。在

你需要有odus,这是你通过做得到的

^{pr2}$

(如果你没有皮普,那么。。。怎么说。。。哈哈哈!)在

但是如果您是类型,您也可以从https://github.com/thorwhalen/odus获取源代码。在

哦,还有拉请求等等,都欢迎!在

明星,喜欢,推荐,咖啡也很受欢迎。在

如果你想捐款:捐给一个慈善机构,帮助人们了解和制定有关物质使用的政策。在

关于架构的简单流程图:

得到一些资源

frommatplotlib.pylabimport*fromnumpyimport*importseabornassnsimportosfrompy2store.stores.local_storeimportRelativePathFormatStorefrompy2store.mixinsimportReadOnlyMixinfrompy2store.baseimportStorefromioimportBytesIOfromspyn.ppi.potimportPot,ProbPotfromcollectionsimportUserDict,Counterimportnumpyasnpimportpandasaspdfromut.ml.feature_extraction.sequential_var_setsimportPVar,VarSet,DfData,VarSetFactoryfromIPython.displayimportImagefromodus.analysis_utilsimport*fromodus.daccimportDfStore,counts_of_kps,Dacc,VarSetCountsStore, \
    mk_pvar_struct,PotStore,_commun_columns_of_dfs,Struct,mk_pvar_str_struct,VarStrfromodus.plot_utilsimportplot_life_course
fromodusimportdata_dir,data_path_ofsurvey_dir=data_dirdata_dir
'/D/Dropbox/dev/p3/proj/odus/odus/data'
df_store=DfStore(data_dir+'/{}.xlsx')len(df_store)cstore=VarSetCountsStore(df_store)v=mk_pvar_struct(df_store,only_for_cols_in_all_dfs=True)s=mk_pvar_str_struct(v)f,df=cstore.df_store.head()pstore=PotStore(df_store)

闲逛

df\ U商店

df_store是一个键值存储,其中key是xls文件,value是准备好的数据帧

len(df_store)
119
it=iter(df_store.values())foriinrange(5):# skip five first_=next(it)df=next(it)# get the one I wantdf.head(3)
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
print(df.columns.values)
['RURAL' 'SUBURBAN' 'URBAN/CITY' 'HOMELESS' 'INCARCERATION' 'WORK'
 'SON/DAUGHTER' 'SIBLING' 'FATHER/MOTHER' 'SPOUSE'
 'OTHER (WHO?, FILL IN BRACKETS HERE)' 'FRIEND USER' 'FRIEND NON USER'
 'MENTAL ILLNESS' 'PHYSICAL ILLNESS' 'LOSS OF LOVED ONE' 'TOBACCO'
 'MARIJUANA' 'ALCOHOL' 'HAL/LSD/XTC/CLUBDRUG' 'COCAINE/CRACK'
 'METHAMPHETAMINE' 'AS PRESCRIBED OPIOID' 'NOT AS PRESCRIBED OPIOID'
 'HEROIN' 'OTHER OPIOID' 'INJECTED' 'IN TREATMENT' 'Selects States below'
 'Georgia' 'Pennsylvania']
t=df[['ALCOHOL','TOBACCO']]t.head(3)
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
c=Counter()fori,rint.iterrows():c.update([tuple(r.to_list())])c
Counter({(0, 0): 6, (1, 0): 4, (1, 1): 9, (0, 1): 2})
defcount_tuples(dataframe):c=Counter()fori,rindataframe.iterrows():c.update([tuple(r.to_list())])returnc
fields=['ALCOHOL','TOBACCO']# do it for every onec=Counter()fordfindf_store.values():c.update(count_tuples(df[fields]))c
Counter({(0, 1): 903, (1, 1): 1343, (0, 0): 240, (1, 0): 179})
pd.Series(c)
^{pr21}$
# Powerful! You can use that with several pairs and get some nice probabilities. Look up Naive Bayes.

观察轨迹

importitertoolsfromfunctoolsimportpartialfromodus.utilimportwrite_imagesfromodus.plot_utilsimportplot_life,life_plots,write_trajectories_to_fileihead=lambdait:itertools.islice(it,0,5)

查看单个轨迹

k=next(iter(df_store))# get the first keyprint(f"k: {k}")# print itplot_life(df_store[k])# plot the trajectory
k: surveys/B24.xlsx

png

plot_life(df_store[k],fields=[s.in_treatment,s.injected])# only want two fields

png

翻转所有(或部分)轨迹

gen=life_plots(df_store)
next(gen)# launch to get the next trajectory
<matplotlib.axes._subplots.AxesSubplot at 0x12b21f070>

png

得到三个轨迹,但只能超过两个区域。在

# fields = [s.in_treatment, s.injected]fields=[s.physical_illness,s.as_prescribed_opioid,s.heroin,s.other_opioid]keys=list(df_store)[:10]# print(f"keys={keys}")axs=[xforxinlife_plots(df_store,fields,keys=keys)];

png

png

png

png

png

png

png

png

png

png

制作轨迹的pdf

^{pr31}$
write_trajectories_to_file(df_store,fp='all_respondents_all_fields.pdf');

Demo s and v

print(list(filter(lambdax:notx.startswith('__'),dir(s))))
['alcohol', 'as_prescribed_opioid', 'cocaine_crack', 'father_mother', 'hal_lsd_xtc_clubdrug', 'heroin', 'homeless', 'in_treatment', 'incarceration', 'injected', 'loss_of_loved_one', 'marijuana', 'mental_illness', 'methamphetamine', 'not_as_prescribed_opioid', 'other_opioid', 'physical_illness', 'rural', 'sibling', 'son_daughter', 'suburban', 'tobacco', 'urban_city', 'work']
^{pr35}$
'HEROIN'
v.heroin
PVar('HEROIN', 0)
v.heroin-1
PVar('HEROIN', -1)

cstore公司

# cstore[v.alcohol, v.tobacco]cstore[v.as_prescribed_opioid-1,v.heroin]
Counter({(0, 0): 1026, (1, 0): 264, (0, 1): 1108, (1, 1): 148})
pd.Series(cstore[v.as_prescribed_opioid-1,v.heroin])
0  0    1026
1  0     264
0  1    1108
1  1     148
dtype: int64
cstore[v.alcohol,v.tobacco,v.heroin]
Counter({(0, 0, 1): 427,
         (1, 0, 1): 656,
         (1, 1, 1): 687,
         (0, 0, 0): 189,
         (0, 1, 1): 476,
         (0, 1, 0): 51,
         (1, 0, 0): 133,
         (1, 1, 0): 46})
cstore[v.alcohol-1,v.alcohol]
Counter({(0, 0): 994, (1, 1): 1375, (1, 0): 90, (0, 1): 87})
cstore[v.alcohol-1,v.alcohol,v.tobacco]
Counter({(0, 0, 1): 807,
         (1, 1, 1): 1220,
         (1, 0, 0): 26,
         (0, 1, 1): 76,
         (0, 0, 0): 187,
         (1, 1, 0): 155,
         (0, 1, 0): 11,
         (1, 0, 1): 64})
^{pr51}$
<pandas.core.indexing._LocIndexer at 0x130955db0>

pstore公司

^{pr53}$ ^{pr54}$ ^{pr55}$
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
t/[]
                       pval
ALCOHOL-1 ALCOHOL          
0         0        0.390416
          1        0.034171
1         0        0.035350
          1        0.540063
t[s.alcohol-1]
           pval
ALCOHOL-1      
0          1081
1          1465
^{pr61}$
                       pval
ALCOHOL-1 ALCOHOL          
0         0        0.919519
          1        0.080481
1         0        0.061433
          1        0.938567
tt=pstore[s.alcohol,s.tobacco]tt
                 pval
ALCOHOL TOBACCO      
0       0         240
        1         903
1       0         179
        1        1343
tt/tt[s.alcohol]
                     pval
ALCOHOL TOBACCO          
0       0        0.209974
        1        0.790026
1       0        0.117608
        1        0.882392
tt/tt[s.tobacco]
                     pval
ALCOHOL TOBACCO          
0       0        0.572792
1       0        0.427208
0       1        0.402048
1       1        0.597952

Scrap place

t=pstore[s.as_prescribed_opioid-1,s.heroin-1,s.heroin]t
                                        pval
AS PRESCRIBED OPIOID-1 HEROIN-1 HEROIN      
0                      0        0        927
                                1        172
                       1        0         99
                                1        936
1                      0        0        249
                                1         33
                       1        0         15
                                1        115
^{pr71}$
                                            pval
AS PRESCRIBED OPIOID-1 HEROIN-1 HEROIN          
0                      0        0       0.843494
                                1       0.156506
                       1        0       0.095652
                                1       0.904348
1                      0        0       0.882979
                                1       0.117021
                       1        0       0.115385
                                1       0.884615
tt.tb
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
AS PRESCRIBED OPIOID-1	HEROIN-1	HEROIN	
0	0	0	0.843494
0	0	1	0.156506
1	0	0	0.882979
1	0	1	0.117021
0.117021/0.156506
^{pr77}$ ^{pr78}$
0.6918605658949217
prob_of_heroin_given_not_presc_op/prob_of_heroin_given_presc_op
1.4453779407220584

微积分实验

# survey_dir = '/D/Dropbox/others/Miriam/python/ProcessedSurveys'df_store=DfStore(survey_dir+'/{}.xlsx')len(df_store)
119
^{pr84}$
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
^{pr86}$ ^{pr87}$ ^{pr88}$ ^{pr89}$
                              pval
HOMELESS-1 INCARCERATION          
0          0              0.663786
           1              0.226630
1          0              0.075412
           1              0.034171
pstore[v.incarceration]
^{pr92}$ ^{pr93}$
                             pval
ALCOHOL-1 LOSS OF LOVED ONE      
0         0                   990
          1                    91
1         0                  1321
          1                   144
^{pr95}$ ^{pr96}$ ^{pr97}$ ^{pr98}$ ^{pr99}$
w/[]
^{pr101}$
(evid_m*mw)/[]
                    pval
MARIJUANA WORK          
1         0     0.350603
          1     0.649397
(evid_t*tw)/[]
                  pval
TOBACCO WORK          
1       0     0.313001
        1     0.686999
(evid_a*aw)/[]
                 pval
ALCOHOL WORK         
1       0     0.29435
        1     0.70565

额外废料

# from graphviz import Digraph# Digraph(body="""# raw -> data -> count -> prob# raw [label="excel files (one per respondent)" shape=folder]# data [label="dataframes" shape=folder]# count [label="counts for any combinations of the variables in the data" shape=box3d]# prob [label="probabilities for any combinations of the variables in the data" shape=box3d]# """.split('\n'))

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java如何运行一个在播放歌曲的同时创建和更改UI的方法?   eclipse错误:无法找到或加载主类Java,因为类文件anme和类名不同?   两个数字相加得到一个值的java算法   java我可以更改字符串吗?   java Hibernate 5.2:以编程方式从其他jar加载映射   java如何访问随机跳转到固定位置的二进制文件   java是解析器实现中文档的功能   Javasocket的两端齐平   java查找将两个非常大的整数之和除以相等块的步骤   java如何在Restlet中调用带超时的异步HTTP客户端   java如何从servlet请求将hashmap传递给jsp。塞塔提布特   java Spring MVC HTTP状态500–内部服务器错误,Servlet。servlet[dispatcher]的init()引发异常   java即使没有alpha通道,如何将PNGFiles加载为ARGB_8888?   java将subscribe的返回类型映射到其他类型   javascript如何在安卓 WebView中启用longpress操作下载图像?   java将字符串作为hashmap值的一部分添加到StringList中   JavaSpringAOP:代表类型声明其他方法或字段   Java将二进制序列转换为字符   java使用ApachePOI获取最后一行值   为什么要在FPS(每秒帧数)跟踪器中添加时间?(爪哇)