使用HDFS python packag上载文件时出现连接错误

2024-06-13 22:21:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试创建一个python程序,它连接到远程机器中的hadoop文件系统,并从中上载和下载文件。程序现在看起来像(IP=my remote machine IP):

from hdfs import InsecureClient
client = InsecureClient('http:/IP:9870', user='hadoop')

path = client.resolve('storage/')
client.makedirs(path, permission=int(755))
client.upload(path,'/home/storage/model1.h5')

client.download('storage/'+'model1.h5','../storage/model1.h5')

我可以成功地使用makedirs命令,但是在上载文件时,我收到以下错误:

^{pr2}$

namenode docker容器的日志信息也不是很丰富:

2019-08-08 10:18:17 INFO  audit:8042 - allowed=true ugi=hadoop (auth:SIMPLE)    ip=/1{ip}   cmd=mkdirs  src=/user/hadoop/storage    dst=null    perm=hadoop:supergroup:rwxr-xr-x    proto=webhdfs


2019-08-08 10:18:18 INFO  audit:8042 - allowed=true ugi=hadoop (auth:SIMPLE)    ip=/{ip}    cmd=listStatus  src=/user/hadoop/storage    dst=null    perm=null   proto=webhdfs


2019-08-08 10:18:18 INFO  audit:8042 - allowed=true ugi=hadoop (auth:SIMPLE)    ip=/{ip}    cmd=delete  src=/user/hadoop/storage/model1.h5  dst=null    perm=null   proto=webhdfs

我做错什么了?在


HDFS生态系统是用这个docker构建的-合成.yaml文件:

version: "2"
services:
   namenode:
      image: flokkr/hadoop:latest
      hostname: namenode
      command: ["hdfs","namenode"]
      ports:
         - 50070:50070
         - 9870:9870
      env_file:
        - ./compose-config
      environment:
          NAMENODE_INIT: "hdfs dfs -chmod 777 /"
          ENSURE_NAMENODE_DIR: "/tmp/hadoop-hadoop/dfs/name"
   datanode:
      command: ["hdfs","datanode"]
      image: flokkr/hadoop:latest
      env_file:
        - ./compose-config
   resourcemanager:
      image: flokkr/hadoop:latest
      hostname: resourcemanager
      command: ["yarn", "resourcemanager"]
      ports:
         - 8088:8088
      env_file:
        - ./compose-config
   nodemanager:
      image: flokkr/hadoop-yarn-nodemanager:latest
      command: ["yarn", "nodemanager"]
      env_file:
        - ./compose-config

撰写配置文件如下所示:

CORE-SITE.XML_fs.default.name=hdfs://namenode:9000
CORE-SITE.XML_fs.defaultFS=hdfs://namenode:9000
HDFS-SITE.XML_dfs.namenode.rpc-address=namenode:9000
HDFS-SITE.XML_dfs.replication=1
LOG4J.PROPERTIES_log4j.rootLogger=INFO, stdout
LOG4J.PROPERTIES_log4j.appender.stdout=org.apache.log4j.ConsoleAppender
LOG4J.PROPERTIES_log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
LOG4J.PROPERTIES_log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
MAPRED-SITE.XML_mapreduce.framework.name=yarn
YARN-SITE.XML_yarn.resourcemanager.hostname=resourcemanager
YARN-SITE.XML_yarn.nodemanager.pmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.delete.debug-delay-sec=600
YARN-SITE.XML_yarn.nodemanager.vmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.aux-services=mapreduce_shuffle
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-applications=10000
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-am-resource-percent=0.1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.queues=default
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.user-limit-factor=1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.maximum-capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.state=RUNNING
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_submit_applications=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_administer_queue=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.node-locality-delay=40
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings=
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings-override.enable=false

Tags: iphadoopdefaultsitestorageroothdfsxml