如何在libsvm中加载数据集
我在想怎么加载我的数据集,以便从libsvm的Python实现中读取它。我的数据是一个250行500列的矩阵,第一列是数据集的标签。我正在使用以下代码来读取数据:
with open("dataset3.txt") as textFile:
lines = [line.split() for line in textFile]
Matrix = [[0 for x in xrange(len(lines[0]))] for x in xrange(len(lines))]
for y in range(0, len(lines)):
for x in range(0, len(lines[0])):
Matrix[y][x] = lines[y][x]
通过上面的代码,我把数据读入了一个二维的整型数组。请问我该如何使用这个数组来进行支持向量机(SVM)的训练和评估呢?
param = svm_parameter('-t 0 -c 4 -b 1')
m = svm_train(Matrix, param)
文本文件:
1 0 9 0 0 0 0 5 2 5 15 2 3 50 0 4 6 27 0 16 34 0 11 30 12 23 41 1 0 2 0 10 67 34 ...
-1 0 10 0 0 0 0 1 0 2 5 1 8 14 0 12 11 4 2 4 22 0 6 40 8 20 47 2 1 0 0 2 1 21 0 1 11 1 ...
...
Matrix = []
with open('dataset3.txt') as f:
row = []
for line in f:
data = line.split()
target = float(data[0]) # target value
str1 = str(target)
for i,j in enumerate(data):
if i==0:
continue
else:
str1 = str1 + " " + str(i) +":"+ str(j) +" "
row.append(str1)
1 个回答
1
试试这段代码
with open('dataset3.txt') as f:
Matrix = [map(float, line.split()) for line in f]
for line in f
是用来逐行读取文件中的内容。line.split()
是把每一行的内容分开,变成一个个的值
。map(float, line.split())
是把这些值
转换成小数(浮点数)。
更新
提问者评论了不同的输入格式。
Matrix = []
with open('dataset3.txt') as f:
for line in f:
data = line.split()
target = float(data[0]) # target value
row = []
for i, (idx, value) in enumerate([item.split(':') for item in data[1:]]):
n = int(idx) - (i + 1) # num missing
for _ in range(n):
row.append(0) # for missing
row.append(float(value))
Matrix.append(row)