我正在尝试从Tensorflow TTS repo培训fastspeech2
在单GPU训练中,它工作正常,但在多GPU训练中,它表示AttributeError:“PerReplica”对象没有属性“numpy”
我试图训练的文件是位于here上的正式fastspeech2训练python文件
我的命令:
CUDA_VISIBLE_DEVICES=0,1,2,3 python examples/fastspeech2/train_fastspeech2.py \
--train-dir ./dump/train/ \
--dev-dir ./dump/valid/ \
--outdir ./examples/fastspeech2/exp/train.fastspeech2.v1/ \
--config ./examples/fastspeech2/conf/fastspeech2.v1.yaml \
--use-norm 1 \
--f0-stat ./dump/stats_f0.npy \
--energy-stat ./dump/stats_energy.npy \
--mixed_precision 1 \
--resume ""
我得到的错误输出如下所述:
Traceback (most recent call last):
File "examples/fastspeech2/train_fastspeech2.py", line 421, in <module>
main()
File "examples/fastspeech2/train_fastspeech2.py", line 413, in main
resume=args.resume,
File "/home/mydir/.local/lib/python3.6/site-packages/tensorflow_tts/trainers/base_trainer.py", line 852, in fit
self.run()
File "/home/mydir/.local/lib/python3.6/site-packages/tensorflow_tts/trainers/base_trainer.py", line 101, in run
self._train_epoch()
File "/home/mydir/.local/lib/python3.6/site-packages/tensorflow_tts/trainers/base_trainer.py", line 127, in _train_epoch
self._check_eval_interval()
File "/home/mydir/.local/lib/python3.6/site-packages/tensorflow_tts/trainers/base_trainer.py", line 164, in _check_eval_interval
self._eval_epoch()
File "/home/mydir/.local/lib/python3.6/site-packages/tensorflow_tts/trainers/base_trainer.py", line 747, in _eval_epoch
self.generate_and_save_intermediate_result(batch)
File "examples/fastspeech2/train_fastspeech2.py", line 150, in generate_and_save_intermediate_result
utt_ids = batch["utt_ids"].numpy()
AttributeError: 'PerReplica' object has no attribute 'numpy'
请帮助我,因为我无法理解多GPU培训中出现此错误的确切原因
我目前正在处理同一份回购协议,遇到了这个错误。不幸的是,我还没有一个修复程序,但在此期间,我正在使用一个变通方法。当培训尝试评估网络时,会引发此错误。根据您在文件“/examples/fastspeech2/conf/fastspeech2.v1.yaml”中设置的eval_internal_步骤,它每x次迭代执行一次。如果将此数字增加到大于train_max_steps的值,则不会调用引发错误的函数
引发此错误的函数是生成和保存中间结果(批处理),据我所知,您可以不使用它进行培训
相关问题 更多 >
编程相关推荐