循环在逻辑回归的多个数据帧上执行相同的上采样任务

# So if I apply to one dataframe it looks like this: # Separate majority and minority classes mask = df01.fld == 0 fld_0 = df01[mask] fld_1 = df01[~mask] # Upsample minority class fld_1_upsampled = resample(fld_1, replace=True, # sample with replacement n_samples=247, # to match majority class random_state=123) # reproducible results # Combine majority class with upsampled minority class df01_upsampled = pd.concat([fld_0, fld_1_upsampled])

df_all = [df01, df02, df03, df04, df05, df06, df07, df08, df09, df10, df11, df12, df13, df14, df15, df16, df17] # This is my list of annual data for i in df_all: fld_0 = i[mask] fld_1 = i[~mask] fld_1_upsampled = resample(fld_1, replace=True, # sample with replacement n_samples=len(fld_0), # to match majority class random_state=123) # reproducible results i_upsampled = pd.concat([fld_0, fld_1_upsampled]) return i_upsampled

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-36-6fd782d4c469> in <module>() 11 replace=True, # sample with replacement 12 n_samples=247, # to match majority class ---> 13 random_state=123) # reproducible results 14 i_upsampled = pd.concat([fld_0, fld_1_upsampled]) 15 return i_upsampled ~/anaconda3/lib/python3.6/site-packages/sklearn/utils/__init__.py in resample(*arrays, **options) 259 260 if replace: --> 261 indices = random_state.randint(0, n_samples, size=(max_n_samples,)) 262 else: 263 indices = np.arange(n_samples) mtrand.pyx in mtrand.RandomState.randint() ValueError: low >= high

1条回答

网友

1楼 · 发布于 2024-05-16 16:06:08

如果您在第二个代码块中使用与第一个代码块相同的mask语法，那么您可能没有任何示例可以传递到一个或多个DFs中的resample：

df=pd.DataFrame({'date':[1,2,3,4,5,6],'ppt':[1.5,0,2.7,4.6,15.5,1.5],'fld':[0,1,0,0,1,1]})

date    ppt     fld
1       1.5     0
2       0.0     1
3       2.7     0
4       4.6     0
5       15.5    1
6       1.5     1

resample(df[df.fld==1], replace=True, n_samples=3, random_state=123)

date    ppt     fld
6       1.5     1
5       15.5    1
6       1.5     1

resample(df[df.fld==2], replace=True, n_samples=3, random_state=123)

"...ValueError: low >= high"

相关问题更多 >

编程相关推荐

热门问题

热门文章