无法在Google云中训练我的Tensorflow检测器模型 - 问答 - Python中文网

无法在Google云中训练我的Tensorflow检测器模型

2024-06-16 11:53:09 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试基于Tensorflow样本和this post训练我自己的探测器模型。我在本地的Macbook Pro上也获得了成功。问题是我没有GPU，在CPU上做太慢了（每次迭代大约25秒）。在

通过这种方式，我尝试在googlecloudml引擎上运行tutorial，但是我不能使它正常运行。在

我的文件夹结构如下所述：

+ data
 - train.record
 - test.record
+ models
 + train
 + eval
+ training
 - ssd_mobilenet_v1_coco

我从本地培训转变为谷歌云培训的步骤是：

在Google云存储中创建一个bucket，用文件复制我的本地文件夹结构
编辑我的pipeline.config文件并将所有路径从Users/dev/detector/更改为gcc://bucketname/
使用教程中提供的默认配置创建一个YAML文件
跑
gcloud ml引擎作业提交训练对象检测{}\ --作业目录=gs://bucketname/models/train\ --包裹距离/目标探测-0.1。焦油gz，纤细/距离/纤细-0.1。焦油gz\ --模块名称对象_列车检测\ --美国东部地区1\ --config/Users/dev/detector/training配置/Users/dev/detector/training/云.yml\ -- \ --train_dir=gs://bucketname/models/train\ --pipeline_config_path=gs://bucketname/data/管道.config

执行此操作时，MLUnits将显示以下错误消息：

副本ps 0以非零状态1退出。终止原因：错误。回溯（最近一次调用）：文件“/usr/lib/python2.7”/runpy.py“，第162行，在_run_module_as_main”uu main_Uu”，fname，loader，pkg_name）文件“/usr/lib/python2.7”/runpy.py“，第72行，在run_code exec code in run_globals File”/root/.local/lib/python2.7/site-packages/object_detection中/火车.py“，第49行，从目标探测导入训练器文件“/root/.local/lib/python2.7/site-packages/object_detection”/py训练器“，第27行，来自对象_检测.建造者导入预处理器_builder File“/root/.local/lib/python2.7/site-packages/object_detection/builders/preprocessor_生成器.py“，第21行，从对象输入_探测.原型import preprocessor_pb2 File“/root/.local/lib/python2.7/site packages/object_detection/protos/preprocessor_pb2.py”，第71行，在options=None，File=DESCRIPTOR）中，类型错误：uu new_uu（）获得意外的关键字参数“File”

提前谢谢。在

Tags：文件对象 py config object models lib packages

2条回答

网友

1楼 · 编辑于 2024-06-16 11:53:09

问题是protobuf版本。自3.5.0版添加file字段https://github.com/google/protobuf/blob/9f80df026933901883da1d556b38292e14836612/CHANGES.txt#L74以来，您可能已经通过brew安装了最新的protoc；和protobuf

所以在上面的更改中，在REQUIRED_PACKAGES中将protobuf version设置为'protobuf>=3.5.1'

网友

2楼 · 编辑于 2024-06-16 11:53:09

查看Anderskog发布的here解决方案。它对我有用。我做了个补丁here。对于手动修复，请遵循以下说明：

确保你的yaml版本是1.4，例如：

trainingInput:
  runtimeVersion: "1.4"
  scaleTier: CUSTOM
  masterType: standard_gpu
  workerCount: 5
  workerType: standard_gpu
  parameterServerCount: 3
  parameterServerType: standard

改变设置.py至以下地址：

^{pr2}$

对象内探测/实用工具/可视化_实用工具.py，第24行（导入前matplotlib.pyplot作为plt）添加：

import matplotlib
matplotlib.use('agg')

在目标探测的第184行/计算器.py，更改

tf.train.get_or_create_global_step()

到

tf.contrib.framework.get_or_create_global_step()

最后，在object_detection/builders/optimizer的第103行_生成器.py，更改

tf.train.get_or_create_global_step()

到

tf.contrib.framework.get_or_create_global_step()

希望这有帮助！在

相关问题更多 >

编程相关推荐

热门问题

热门文章