我有一个json文件,其中包含如下结构的培训数据:
[
{
"audio_path": "common_voice_de_22136164.wav",
"label": "Diese pyromanen ... Vertrauen."
},
{
"audio_path": "common_voice_de_19872706.wav",
"label": "Die einzelnen Unterar...
我的目标是在将音频路径转换为波形后,将这个json数据馈送到TensorFlow数据集对象中。我试图在tensorflow.org上重新创建类似于本教程的内容:https://www.tensorflow.org/tutorials/audio/simple_audio
我的尝试是将json数据转换为python列表,将它们输入tf.dataset,并应用一个函数,使用.map()
方法将音频文件转换为波形
下面是我要加载到JSON(训练、测试)的python列表中的代码:
def load_json_into_lists(train_ds: str, test_ds: str, validation_size=0.09):
# read json containing training data
train_data = pd.read_json(train_ds, lines=False)
# read json containing test data
test_data = pd.read_json(test_ds, lines=False)
# store json training data into a python list
train_data_list = train_data.values.tolist()
# store json training data into a python list
test_data_list = test_data.values.tolist()
# split train into train and validation
new_train_data_list, validation_data_list = split_validation_from_train(train_data_list,
validation_size=validation_size)
print(f"Ex.:{new_train_data_list[0]}, Len.:{len(new_train_data_list)}, Type:{type(new_train_data_list)}")
print(f"Ex.:{validation_data_list[0]}, Len.:{len(validation_data_list)}, Type:{type(validation_data_list)}")
print(f"Ex.:{test_data_list[0]}, Len.:{len(test_data_list)}, Type:{type(test_data_list)}")
return new_train_data_list, validation_data_list, test_data_list
以下是将培训列表拆分为培训和验证列表的代码:
def split_validation_from_train(train_data_list: list, validation_size: float):
calculate_validation_size = round(len(train_data_list) * validation_size)
print("Calculated Validation Dataset size: ", calculate_validation_size)
# all elements til 178728
train_data_list_new = train_data_list[:(len(train_data_list)-calculate_validation_size)]
# all elements from 178729 to 194404 (17676)
validation_data_list = train_data_list[len(train_data_list_new):]
print("Validation Dataset size: ", len(validation_data_list))
print("New Train Dataset size: ", len(train_data_list_new))
return train_data_list_new, validation_data_list
然后我得到了一些波形转换函数,这是受上面提到的TensorFlow教程的启发
# Audio Processing
def decode_audio(audio_binary):
audio_, _ = tf.audio.decode_wav(audio_binary)
return tf.squeeze(audio_, axis=-1)
#@tf.function
def get_label(file_path):
# get the loaded lists
train_list, _, _ = load_json_into_lists(TRAIN_DS_PATH, TEST_DS_PATH)
for sublist in train_list:
if sublist[0] == str(file_path):
return sublist[1]
#@tf.function
def get_waveform(file_path):
# get the loaded lists
train_list, _, _ = load_json_into_lists(TRAIN_DS_PATH, TEST_DS_PATH)
for sublist in train_list:
if sublist[0] == str(file_path):
file_to_read = str("/de/cv_valid_data/" + sublist[0])
audio_binary = tf.io.read_file(file_to_read)
waveform = decode_audio(audio_binary)
return waveform
def get_waveform_and_label(file_path):
# get label
label_ = get_label(file_path)
# get waveform
waveform_ = get_waveform(file_path)
return waveform_, label_
最后是应用.map()
和get_waveform_and_label
函数获取波形数据集的代码
出现错误(我真的不知道是什么原因造成的):
Traceback (most recent call last):
File "training_test_3.py", line 123, in <module>
tf_waveform_ds = convert_lists_into_tf_ds()
File "/Users/pietmuller/miniforge3/envs/tensorM1_new_3/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 620, in wrapper
return func(*args, **kwargs)
File "training_test_3.py", line 111, in convert_lists_into_tf_ds
tf_waveform_ds_ = tf_train_ds.map(get_waveform_and_label)
File "/Users/pietmuller/miniforge3/envs/tensorM1_new_3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1805, in map
return MapDataset(self, map_func, preserve_cardinality=True)
File "/Users/pietmuller/miniforge3/envs/tensorM1_new_3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4208, in __init__
variant_tensor = gen_dataset_ops.map_dataset(
File "/Users/pietmuller/miniforge3/envs/tensorM1_new_3/lib/python3.8/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 3028, in map_dataset
_ops.raise_from_not_ok_status(e, name)
File "/Users/pietmuller/miniforge3/envs/tensorM1_new_3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 6862, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Length for attr 'output_shapes' of 0 must be at least minimum 1
; NodeDef: {{node MapDataset}}; Op<name=MapDataset; signature=input_dataset:variant, other_arguments: -> handle:variant; attr=f:func; attr=Targuments:list(type),min=0; attr=output_types:list(type),min=1; attr=output_shapes:list(shape),min=1; attr=use_inter_op_parallelism:bool,default=true; attr=preserve_cardinality:bool,default=false> [Op:MapDataset]
谢谢你的回答
目前没有回答
相关问题 更多 >
编程相关推荐