将用户定义的函数应用于数据帧

def regression(): X=Final1.copy() y=Final1[['Sales']].copy() X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=.2, random_state=0) sel=f_classif(X_train, y_train) p_values=pd.Series(sel[0], index=X_train.columns) p_values=p_values.reset_index() pd.options.display.float_format = "{:,.2f}".format return p_values Finals=[] Finals=pd.DataFrame(Finals) for group in Final.groupby('Key'): # group is a tuple where the first value is the Key and the second is the dataframe Final1=group[1] Final1=pd.DataFrame(Final1) result=regression() Finals=pd.concat([Finals, result], axis=1) # do xyz with result print(Finals)

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-131-c3a3b53971d5> in <module> 5 Final1=group[1] 6 Final1=pd.DataFrame(Final1) ----> 7 result=regression() 8 Finals=pd.concat([Finals, result], axis=1) 9 <ipython-input-120-d5c718baaba8> in regression() 2 X=Final1.iloc[:,7:-1].copy() 3 y=Final1[['Sale Rate']].copy() ----> 4 X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=.2, random_state=0) 5 sel=f_classif(X_train, y_train) 6 p_values=pd.Series(sel[0], index=X_train.columns) ~\anaconda3\lib\site-packages\sklearn\model_selection\_split.py in train_test_split(*arrays, **options) 2120 n_samples = _num_samples(arrays[0]) 2121 n_train, n_test = _validate_shuffle_split(n_samples, test_size, train_size, -> 2122 default_test_size=0.25) 2123 2124 if shuffle is False: ~\anaconda3\lib\site-packages\sklearn\model_selection\_split.py in _validate_shuffle_split(n_samples, test_size, train_size, default_test_size) 1803 'resulting train set will be empty. Adjust any of the ' 1804 'aforementioned parameters.'.format(n_samples, test_size, -> 1805 train_size) 1806 ) 1807 ValueError: With n_samples=1, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

2条回答

网友

1楼 · 编辑于 2024-06-17 08:40:49

一个简单的解决方案是：

for group in Final.groupby('Key'): 
    # group is a tuple where the first value is the Key and the second is the dataframe
    result = regression(group[1])
    # do xyz with result

编辑：

您不必再次将组转换为数据帧，并且可以直接使用它，因为它已经采用了正确的格式

# this line is not necessary
Final1 = pd.DataFrame(Final1)

从错误判断，很明显，您传递到train_test_split函数的group没有足够的记录。这在错误消息中非常明显。您必须使用try来处理此类错误，除非

网友

2楼 · 编辑于 2024-06-17 08:40:49

只要我过滤掉所有少于10个观察值的键，代码就会工作

相关问题更多 >

编程相关推荐

热门问题

热门文章