使用hdfs3读取文件失败

2024-04-16 04:14:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用hdfs3模块用Python读取HDFS文件。在

import hdfs3
hdfs = hdfs3.HDFileSystem(host='xxx.xxx.com', port=12345)
hdfs.ls('/projects/samplecsv/part-r-00000')

这就产生了

^{pr2}$

所以它似乎能够访问HDFS并读取目录结构。但是,读取文件失败。在

with hdfs.open('/projects/samplecsv/part-r-00000', 'rb') as f:
    print(f.read(100))

给予

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-94-46f0db8e87dd> in <module>()
      1 with hdfs.open('/projects/samplecsv/part-r-00000', 'rb') as f:
----> 2     print(f.read(100))

/anaconda3/lib/python3.5/site-packages/hdfs3/core.py in read(self, length)
    615                     length -= ret
    616                 else:
--> 617                     raise IOError('Read file %s Failed:' % self.path, -ret)
    618 
    619         return b''.join(buffers)
OSError: [Errno Read file /projects/samplecsv/part-r-00000 Failed:] 1

可能是什么问题?我用的是Python3.5。在


Tags: 文件readaswithhdfsopenxxxprojects
2条回答

如果要对文件执行任何操作,则必须传递完整的文件路径。在

import hdfs3
hdfs = hdfs3.HDFileSystem(host='xxx.xxx.com', port=12345)
hdfs.ls('/projects/samplecsv/part-r-00000')

#you have to add file to location
hdfs.put('local-file.txt', '/projects/samplecsv/part-r-00000')

with hdfs.open('projects/samplecsv/part-r-00000/local-file.txt', 'rb') as f:
    print(f.read(100))

如果您想从hdfs目录读取多个文件,可以尝试以下示例:

  import hdfs3
  hdfs = hdfs3.HDFileSystem(host='xxx.xxx.com', port=12345)
  hdfs.ls('/projects/samplecsv/part-r-00000')

  #you have to add file to location if its not present.
  hdfs.put('local-file.txt', '/projects/samplecsv/part-r-00000')

  file_loc = '/projects/samplecsv/part-r-00000'
  for file in hdfs.glob(os.path.join(file_loc , '*.txt')):
      with hdfs.open(file) as f:
          print(f.read(100))

相关问题 更多 >