在Anaconda Python发行版中使用scikit-learn时的freeze_support错误?

1 投票
2 回答
1652 浏览
提问于 2025-04-18 10:45

我只是想确认一下,这个问题不是出在我的代码上,而是需要在相关的Python包里修复。(顺便问一下,这个问题看起来我能不能在供应商发布更新之前自己手动修复一下呢?)我当时使用的是scikit-learn-0.15b1,它调用了这些功能。谢谢!

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 380, in main
    prepare(preparation_data)
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 495, in prepare
    '__parents_main__', file, path_name, etc
  File "H:\Documents\GitHub\health_wealth\code\controls\lasso\scikit_notreat_predictors.py", line 36, in <module>
    gs.fit(X_train, y_train)
  File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 597, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 379, in _fit
    for parameters in parameter_iterable
  File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.py", line 604, in __call__
    self._pool = MemmapingPool(n_jobs, **poolargs)
  File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 559, in __init__
    super(MemmapingPool, self).__init__(**poolargs)
  File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 400, in __init__
    super(PicklingPool, self).__init__(**poolargs)
  File "C:\Anaconda\lib\multiprocessing\pool.py", line 159, in __init__
    self._repopulate_pool()
  File "C:\Anaconda\lib\multiprocessing\pool.py", line 223, in _repopulate_pool
    w.start()
  File "C:\Anaconda\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 258, in __init__
    cmd = get_command_line() + [rhandle]
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 358, in get_command_line
    is not going to be frozen to produce a Windows executable.''')
RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce a Windows executable.

更新:这是我编辑过的脚本,但在为GridSearchCV启动进程后,它仍然出现完全相同的错误。实际上,在命令报告它将进行多少次折叠和拟合之后不久就崩溃了,但除此之外我不知道具体是什么时候崩溃的。我需要把freeze_support放在别的地方吗?

import scipy as sp
import numpy as np
import pandas as pd
import multiprocessing as mp

if __name__=='__main__':
    mp.freeze_support()

print("Started.")
# n = 10**6
# notreatadapter = iopro.text_adapter('S:/data/controls/notreat.csv', parser='csv')
# X = notreatadapter[1:][0:n]
# y = notreatadapter[0][0:n]
notreatdata = pd.read_stata('S:/data/controls/notreat.dta')
X = notreatdata.iloc[:,1:]
y = notreatdata.iloc[:,0]
n = y.shape[0]

print("Data lodaded.")
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4, random_state=0)

print("Data split.")
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)  # Don't cheat - fit only on training data
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)  # apply same transformation to test data

print("Data scaled.")
# build a model
from sklearn.linear_model import SGDClassifier
model = SGDClassifier(penalty='elasticnet',n_iter = np.ceil(10**6 / n),shuffle=True)
#model.fit(X,y)

print("CV starts.")
from sklearn import grid_search
# run grid search
param_grid = [{'alpha' : 10.0**-np.arange(1,7),'l1_ratio':[.05, .15, .5, .7, .9, .95, .99, 1]}]
gs = grid_search.GridSearchCV(model,param_grid,n_jobs=8,verbose=1)
gs.fit(X_train, y_train)

print("Scores for alphas:")
print(gs.grid_scores_)
print("Best estimator:")
print(gs.best_estimator_)
print("Best score:")
print(gs.best_score_)
print("Best parameters:")
print(gs.best_params_)

2 个回答

1

你可以在这里找到关于多进程的信息,具体在第16.6.2.3节:点击这里

下面是一个可以运行的例子:

from multiprocessing import Process, freeze_support

def f():
    print 'hello world!'

if __name__ == '__main__':
    freeze_support()
    Process(target=f).start()
0

这可能意味着你正在使用Windows系统,并且你忘记在主模块中使用正确的写法:

if __name__ == '__main__':
    freeze_support()

撰写回答