Ray群集配置文件\u mounts部分不允许启动工作节点

2024-06-17 09:06:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用配置中的file\ mounts块,尝试将少量文件分发到awsec2上的Ray集群中的每个节点文件:-你知道吗

文件\u装载:{ “/”:“/运行\u文件” }你知道吗

集群启动时只有一个主节点,run\u files目录的内容已正确复制到该节点上。但是,请求的两个工作节点不会启动。如果我省略file\u mounts部分,workers将启动。光线监视器指示查找文件时出现问题libtcl.so文件在Anaconda3安装的matplotlib子目录中。此文件位于主节点上的正确路径上,因此工作节点上的设置似乎不起作用适当地:-你知道吗

$ ray exec ray_conf.yaml  'tail -n 100 -f /tmp/ray/session_*/logs/monitor*'
2019-05-29 19:36:14,019 INFO updater.py:95 -- NodeUpdater: Waiting for IP of i-073950262949fe9a8...
2019-05-29 19:36:14,019 INFO log_timer.py:21 -- NodeUpdater: i-073950262949fe9a8: Got IP [LogTimer=362ms]
2019-05-29 19:36:14,025 INFO updater.py:272 -- NodeUpdater: Running tail -n 100 -f /tmp/ray/session_*/logs/monitor* on 54.175.173.233...
==> /tmp/ray/session_2019-05-29_23-35-49_842129_4407/logs/monitor.err <==
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ray/monitor.py", line 376, in <module>
redis_password=args.redis_password)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ray/monitor.py", line 54, in __init__
self.load_metrics)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ray/autoscaler/autoscaler.py", line 349, in __init__
self.reload_config(errors_fatal=True)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ray/autoscaler/autoscaler.py", line 523, in reload_config
raise e
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ray/autoscaler/autoscaler.py", line 516, in reload_config
new_config["worker_start_ray_commands"]
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ray/autoscaler/autoscaler.py", line 790, in hash_runtime_conf
add_content_hashes(local_path)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ray/autoscaler/autoscaler.py", line 778, in add_content_hashes
add_hash_of_file(fpath)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ray/autoscaler/autoscaler.py", line 764, in add_hash_of_file
with open(fpath, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: './anaconda3/pkgs/matplotlib-2.1.0-py36hba5de38_0/lib/libtcl.so'

==> /tmp/ray/session_2019-05-29_23-35-49_842129_4407/logs/monitor.out <==

(请注意,这个问题是继“Workers not being launched on EC2 by ray”问题之后出现的,我在一个新问题中继续讨论,因为现在更明确地确定了错误的来源。)


Tags: 文件inpyhome节点ubuntulibpackages
1条回答
网友
1楼 · 发布于 2024-06-17 09:06:34

我认为libtcl.so文件错误消息非常容易误导。问题是文件\u mounts remote path不能是worker上的主目录(也不能是./nor~/works);它必须是子目录。所以下面是s成功:你知道吗

file_mounts: {"~/run_files": "./run_files"}

相关问题 更多 >