在TensorFlow 2.3.1数据集图中生成DICT列表时，InvalidArgumentError无效

代码

主函数从generator创建TF Dataset，生成前面描述的np.ndarray，然后运行map函数下载并解析文件，它是：

def load_dataset(records: np.ndarray) -> tf.data.Dataset: """Create Tensorflow Dataset MapDataset (generator) from a list of gs:// data URL. Args: records (np.ndarray): List of strings, which are gs://<foo>/foo<N>/*.jsonl.gz files Returns: tf.data.Dataset: MapDataset generator which can be used for training Keras models. """ dataset = tf.data.Dataset.from_generator(lambda: _generator(records), (tf.string, tf.int8)) return dataset def _generator(records): for r in records: yield r[0], r[1]

如您所见，generator只是通过np.ndarray进行迭代以获得url和'line index'

然后我必须从URL中load and preprocess该文件以获得json -> Dict对象的列表

def _load_and_preprocess(filepath, selected_sample): """Read a file GCS or local path and process it into a tensor Args: path (tensor): path string, pointer to GCS or local path Returns: tensor: processed input """ sample_raw_input = tf.io.read_file(filepath) uncompressed_inputs = tf.py_function(_get_uncompressed_inputs, [sample_raw_input], tf.string) sample = tf.py_function(_load_sampled_sample, [uncompressed_inputs, selected_sample], tf.float32) #This `tf.float32` is definitely wrong return sample #This is not a tensor, but a List of Dictionaries which I will process later def _get_uncompressed_inputs(record): return zlib.decompress(record.numpy(), 16 + zlib.MAX_WBITS) def _load_sampled_sample(inputs: Iterable, selected_sample: List[int]) -> List[Dict[str, str]]: if not tf.executing_eagerly(): raise RuntimeError("TensorFlow must be executing eagerly.") inputs = inputs.numpy() selected_sample = selected_sample.numpy() sample = _load__sampled_sample_from_jsonl(inputs, selected_sample) return sample def _load__sampled_sample_from_jsonl(jsonl: bytes, selected_sample: List[int]) -> List[Dict[str, str]]: json_lines = _read_jsonl(jsonl).split("\n") sample = list() for n, sample_json in enumerate(json_lines): sample_obj = _read_json(sample_json) if n in selected_sample else None if sample_obj: sample.append(sample_obj) return sample def _read_jsonl(jsonl: bytes) -> str: return jsonl.decode()

执行

然后，我用上述代码创建数据集，并尝试从中检索单个样本进行测试

val_ds = load_dataset(validation_records) samples = tf.data.experimental.get_single_element( val_ds ) # This should be a list of Dicts

其中{}：

InvalidArgumentError: ValueError: Attempt to convert a value ({...}) with an unsupported type (<class 'dict'>) to a Tensor. # ... are the dict values, which is really big so I've shortened it to `...` Traceback (most recent call last): File "/home/victor/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 242, in __call__ return func(device, token, args) File "/home/victor/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 140, in __call__ outputs = [ File "/home/victor/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 141, in <listcomp> _maybe_copy_to_context_device(self._convert(x, dtype=dtype), File "/home/victor/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 120, in _convert return ops.convert_to_tensor(value, dtype=dtype) File "/home/victor/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1499, in convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/home/victor/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 338, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "/home/victor/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 263, in constant return _constant_impl(value, dtype, shape, name, verify_shape=False, File "/home/victor/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 275, in _constant_impl return _constant_eager_impl(ctx, value, dtype, shape, verify_shape) File "/home/victor/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 300, in _constant_eager_impl t = convert_to_eager_tensor(value, ctx, dtype) File "/home/victor/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 98, in convert_to_eager_tensor return ops.EagerTensor(value, ctx.device_name, dtype) ValueError: Attempt to convert a value ({...}) with an unsupported type (<class 'dict'>) to a Tensor. # ... are the dict values, which is really big so I've shortened it to `...` [[{{node EagerPyFunc_1}}]] [Op:DatasetToSingleElement]

1条回答

网友

1楼 · 发布于 2024-05-21 01:14:55

好的，我想我已经通过急切地运行dataset.map函数修复了它：

dataset.map(lambda file, samples: tf.py_function(_load_and_preprocess, [file, samples], tf.variant))

这里描述：How can you map values in a tf.data.Dataset using a dictionary

问题陈述

代码

执行

结论

其他信息：

相关问题更多 >

编程相关推荐

热门问题

热门文章