数据帧错误地包含输出中的数据类型

2024-06-16 11:01:08 发布

您现在位置:Python中文网/ 问答频道 /正文

在iPy笔记本(见下文)中,我正在做一些数据按摩,从CSV文件中提取所需的数据。我通过创建新的数据帧来实现这一点,我遇到了一个我从未见过的问题——在数据帧中的每个新字典的末尾,都包含了数据类型

 df_files = glob.glob('/Users/snplabadmin...')

 all_regressors= {'participant':[], 'sooner': [], 'safer':[], 'later':[], 'risky':[]}
#output = {}

for df_file in df_files:
    df = pd.read_csv(df_file)

    participant = df['participant'][0]

    #make sure all response keys are coded as strings
    df['choice_key.keys'] = map(str, df['choice_key.keys']) #convert every item in df['choice_key.keys'] to a string

    #create new column of coded responses
    df['resp']=0  # Initialize to 0 (good for misses, too)
    df['resp'][df['choice_key.keys']=='1'] = 1
    df['resp'][df['choice_key.keys']=='1.0'] = 1  # Left == Sooner/Safer
    df['resp'][df['choice_key.keys']=='2'] = 2
    df['resp'][df['choice_key.keys']=='2.0'] = 2  # Right == Later/Riskier

    #create runs
    run_1 = df[0:36]
    run_2 = df[38:73]
    run_3 = df[74:110]
    run_4 = df[111:147]
    run_5 = df[148:184]
    run_6 = df[185:221]
    runs = [run_1, run_2, run_3, run_4, run_5, run_6]

    #define counter for loop
    counter = 1

    for run in runs:
        counter = counter
        run_numb = participant + str(counter)
        print run_numb
        delays = run[run['delay0_prob1'] == 0] # separate delay trials into dataframe
        probs = run[run['delay0_prob1'] == 1] # separate prob trials into dataframe 

        #parse responses from delay and prob dataframes
        delays_sooner = delays[delays['resp'] == 1]
        #print delays_sooner['ddpd']
        delays_later = delays[delays['resp'] == 2]
        probs_safer = probs[probs['resp'] == 1]
        probs_risky = probs[probs['resp'] == 2]

        sooner = delays_sooner['ResponseTime']
        safer = probs_safer['ResponseTime']
        later = delays_later['ResponseTime']
        risky = probs_risky['ResponseTime']

        all_regressors['sooner'].append(delays_sooner['ResponseTime'])
        all_regressors['safer'].append(probs_safer['ResponseTime'])
        all_regressors['later'].append(delays_later['ResponseTime'])
        all_regressors['risky'].append(probs_risky['ResponseTime'])
        all_regressors['participant'].append(run_numb)

        counter = counter +1

“所有回归者”的字典应该只包含一个数字列表,但我看到的是:

    tdcs_208p1
8     180.00
13     90.00
15      0.25
26     30.00
27     90.00
Name: ddpd, dtype: float64
tdcs_208p2
71    30
Name: ddpd, dtype: float64
tdcs_208p3
Series([], name: ddpd, dtype: float64)
tdcs_208p4
111    180
124    180
127      7
138     90
146    180
Name: ddpd, dtype: float64
tdcs_208p5
153     90
156    180
179     90
Name: ddpd, dtype: float64
tdcs_208p6
210    1
Name: ddpd, dtype: float64

你知道我为什么要得到这些额外的输入,以及我怎样才能摆脱它们吗?我只想要数字

谢谢


Tags: keyrundfcounterkeysallrespchoice
1条回答
网友
1楼 · 发布于 2024-06-16 11:01:08

下面简单的更改(.value)修正了问题,谢谢all_regressors['sooner'].append(delays_sooner['ResponseTime'].values) all_regressors['safer'].append(probs_safer['ResponseTime'].values) all_regressors['later'].append(delays_later['ResponseTime'].values) all_regressors['risky'].append(probs_risky['ResponseTime'].values)

相关问题 更多 >