如何使用稀疏数据的随机林?

2024-03-28 12:41:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用SKLearn的随机森林分类器。我有一个错误:

ValueError: setting an array element with a sequence.

看不出它是从哪里来的。这是我的密码:

##import here

def get_data_target(lang):
    with open('vect/' + lang + '/vect_'+lang, "rb") as f:
        vect = pickle.load(f)
        data, target = [], []
        i = 0
        for v in vect:
            for x, y in v.items():
                data.append(y)
                target.append(x)
                
        return np.asarray(data), np.asarray(target)

data_fr, target_fr = get_data_target('fr')


X_train, X_test, y_train, y_test = train_test_split(data_fr, target_fr, test_size=0.1, random_state=0)

random_forest = RandomForestClassifier(n_estimators=30, max_depth=10, random_state=1)
random_forest.fit(X_train, y_train)

编辑

对于print(X_train[:10]),我有:

[<1x567 sparse matrix of type '<class 'numpy.int64'>'
    with 567 stored elements in Compressed Sparse Row format>
 <1x5574 sparse matrix of type '<class 'numpy.int64'>'
    with 5574 stored elements in Compressed Sparse Row format>
 <1x6419 sparse matrix of type '<class 'numpy.int64'>'
    with 6419 stored elements in Compressed Sparse Row format>
 <1x1477 sparse matrix of type '<class 'numpy.int64'>'
    with 1477 stored elements in Compressed Sparse Row format>
 <1x1347 sparse matrix of type '<class 'numpy.int64'>'
    with 1347 stored elements in Compressed Sparse Row format>
 <1x3588 sparse matrix of type '<class 'numpy.int64'>'
    with 3588 stored elements in Compressed Sparse Row format>
 <1x5856 sparse matrix of type '<class 'numpy.int64'>'
    with 5856 stored elements in Compressed Sparse Row format>
 <1x1080 sparse matrix of type '<class 'numpy.int64'>'
    with 1080 stored elements in Compressed Sparse Row format>
 <1x1600 sparse matrix of type '<class 'numpy.int64'>'
    with 1600 stored elements in Compressed Sparse Row format>
 <1x6781 sparse matrix of type '<class 'numpy.int64'>'
    with 6781 stored elements in Compressed Sparse Row format>]

对于我的堆栈跟踪错误,我有如下内容:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-35-526b5f56b73c> in <module>
      3 random_forest = RandomForestClassifier(n_estimators=30, max_depth=10, random_state=1)
      4 
----> 5 random_forest.fit(X_train, y_train)

D:\Anaconda\lib\site-packages\sklearn\ensemble\forest.py in fit(self, X, y, sample_weight)
    247 
    248         # Validate or convert input data
--> 249         X = check_array(X, accept_sparse="csc", dtype=DTYPE)
    250         y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)
    251         if sample_weight is not None:

D:\Anaconda\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    494             try:
    495                 warnings.simplefilter('error', ComplexWarning)
--> 496                 array = np.asarray(array, dtype=dtype, order=order)
    497             except ComplexWarning:
    498                 raise ValueError("Complex data not supported\n"

D:\Anaconda\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
    536 
    537     """
--> 538     return array(a, dtype, copy=False, order=order)
    539 
    540 

ValueError: setting an array element with a sequence.

Tags: ofinnumpyformattypewithelementscompressed