在numpy中搜索k个最近邻

[('ADT1_YEAST', 0.58, 0.61, 0.47, 0.13, 0.5, 0.0, 0.48, 0.22, 'MIT') ('ADT2_YEAST', 0.43, 0.67, 0.48, 0.27, 0.5, 0.0, 0.53, 0.22, 'MIT') ('ADT3_YEAST', 0.64, 0.62, 0.49, 0.15, 0.5, 0.0, 0.53, 0.22, 'MIT') ..., ('ZNRP_YEAST', 0.67, 0.57, 0.36, 0.19, 0.5, 0.0, 0.56, 0.22, 'ME2') ('ZUO1_YEAST', 0.43, 0.4, 0.6, 0.16, 0.5, 0.0, 0.53, 0.39, 'NUC') ('G6PD_YEAST', 0.65, 0.54, 0.54, 0.13, 0.5, 0.0, 0.53, 0.22, 'CYT')]

2条回答

网友

1楼 · 编辑于 2024-05-23 20:45:57

如果我理解这个问题，那么您实际上是在问如何对categorical variables进行编码，以便它们可以被最近邻算法正确地解释。您可以使用sklearn完成此操作，如4.2.4. Encoding categorical features中所述。另一方面，如果你有不完整的特征，4.2.6. Imputation of missing values。在

网友

2楼 · 编辑于 2024-05-23 20:45:57

我认为你需要把数据正确地输入矩阵。我通常会这样做：

import numpy as np

features = [] # list of lists of the feature vairables.
classes  = [] # list of the target variables
for line in f:
    line = line.strip().split() # will split the line into pieces on any white spaces
    features.append(line[1:-1]) # or whatever indices your features are in
    classes.append(line[-1])    # or whatever index your target variable is in
classes  = np.array(classes)
features = np.array(features,dtype=np.float)

相关问题更多 >

编程相关推荐

热门问题

热门文章