使用numpy.genfromtxt时出现TypeError: 无法隐式将'bytes'对象转换为str
我有一个在kaggle.com上的Python项目。我在读取数据集时遇到了问题。这个数据集有一个csv文件。我们需要把它读进来,并把目标部分和训练部分放到数组里。
以下是数据集的前三行(目标列是第19列,特征是前18列):
user gender age how_tall_in_meters weight body_mass_index x1
debora Woman 46 1.62 75 28.6 -3
debora Woman 46 1.62 75 28.6 -3
这里没有显示的目标列包含字符串值。
from pandas import read_csv
import numpy as np
from sklearn.linear_model.stochastic_gradient import SGDClassifier
from sklearn import preprocessing
import sklearn.metrics as metrics
from sklearn.cross_validation import train_test_split
#d = pd.read_csv("data.csv", dtype={'A': np.str(), 'B': np.str(), 'S': np.str()})
dataset = np.genfromtxt(open('data.csv','r'), delimiter=',', dtype='f8')[1:]
target = np.array([x[19] for x in dataset])
train = np.array([x[1:] for x in dataset])
print(target)
我遇到的错误是:
Traceback (most recent call last):
File "C:\Users\Cameron\Desktop\Project - Machine learning\datafilesforproj\SGD_classifier.py", line 12, in <module>
dataset = np.genfromtxt(open('data.csv','r'), delimiter=',', dtype='f8')[1:]
File "C:\Python33\lib\site-packages\numpy\lib\npyio.py", line 1380, in genfromtxt
first_values = split_line(first_line)
File "C:\Python33\lib\site-packages\numpy\lib\_iotools.py", line 217, in _delimited_splitter
line = line.split(self.comments)[0]
TypeError: Can't convert 'bytes' object to str implicitly
5 个回答
-1
不要这样做:
dataset = np.genfromtxt(open('data.csv','r'), delimiter=',', dtype='f8')[1:]
试试这样:
dataset = np.genfromtxt('C:\\\\..\\\\..\\\train.csv', delimiter=',', dtype='None')[1:]
注意,你需要多加一个'\'来转义另一个字符。
0
你需要把 bytes
对象明确地转换成 str
对象,正如 TypeError
所提示的那样。
# For instance, interpret as UTF-8 (depends on your source)
self.comments = self.comments.decode('utf-8')
1
根据这个链接 https://mail.python.org/pipermail/python-list/2012-April/622487.html,你可能需要
import io
import sys
inpstream = io.open('data.csv','rb')
dataset = np.genfromtxt(inpstream, delimiter=',', dtype='f8')[1:]
在这个例子里,链接 http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html 中提到的文件对象是用 StringIO
这个类创建的。不过,根据这个函数的说明,我猜传入文件名应该也是可以的。
3
这实际上是numpy中的一个错误,具体可以参考问题 #3184。
我在那边提到的解决方法我就直接复制过来了:
import functools
import io
import numpy as np
import sys
genfromtxt_old = np.genfromtxt
@functools.wraps(genfromtxt_old)
def genfromtxt_py3_fixed(f, encoding="utf-8", *args, **kwargs):
if isinstance(f, io.TextIOBase):
if hasattr(f, "buffer") and hasattr(f.buffer, "raw") and \
isinstance(f.buffer.raw, io.FileIO):
# Best case: get underlying FileIO stream (binary!) and use that
fb = f.buffer.raw
# Reset cursor on the underlying object to match that on wrapper
fb.seek(f.tell())
result = genfromtxt_old(fb, *args, **kwargs)
# Reset cursor on wrapper to match that of the underlying object
f.seek(fb.tell())
else:
# Not very good but works: Put entire contents into BytesIO object,
# otherwise same ideas as above
old_cursor_pos = f.tell()
fb = io.BytesIO(bytes(f.read(), encoding=encoding))
result = genfromtxt_old(fb, *args, **kwargs)
f.seek(old_cursor_pos + fb.tell())
else:
result = genfromtxt_old(f, *args, **kwargs)
return result
if sys.version_info >= (3,):
np.genfromtxt = genfromtxt_py3_fixed
在你的代码顶部加上这个之后,你就可以再次使用np.genfromtxt
,在Python 3中应该就能正常工作了。
4
对我有效的是把这一行
dataset = np.genfromtxt(open('data.csv','r'), delimiter=',', dtype='f8')[1:]
改成了
dataset = np.genfromtxt('data.csv', delimiter=',', dtype='f8')[1:]
(不过,我不太确定根本的问题是什么)