在重新初始化数据后，损失会回到起始值

from with_hyperparams import stft from model import lstm_network import tensorflow as tf def read_wavfile(): for file in itertools.chain(DATA_PATH.glob("**/*.ogg"), DATA_PATH.glob("**/*.wav")): waveform, samplerate = librosa.load(file, sr=hparams.sample_rate) if len(waveform.shape) > 1: waveform = waveform[:, 1] yield waveform audio_dataset = Dataset.from_generator( read_wavfile, tf.float32, tf.TensorShape([None])) dataset = audio_dataset.padded_batch(5, padded_shapes=[None]) iterator = tf.data.Iterator.from_structure(dataset.output_types, dataset.output_shapes) dataset_init_op = iterator.make_initializer(dataset) signals = iterator.get_next() magnitude_spectrograms = tf.abs(stft(signals)) output, loss = lstm_network(magnitude_spectrograms) train_op = tf.train.AdamOptimizer(1e-3).minimize(loss) init_op = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init_op) for i in range(20): print(i) sess.run(dataset_init_op) while True: try: l, _ = sess.run((loss, train_op)) print(l) except tf.errors.OutOfRangeError: break

2条回答

网友

1楼 · 编辑于 2024-05-15 04:08:24

这看起来像是架构中的一个问题。首先，您正在移动中生成数据，尽管这是一种常用的技术，但并不总是最合理的选择。这是因为：

One of the downsides of Dataset.from_generator() is shuffling the resulting dataset with a shuffle buffer of size n requires n examples to be loaded. This will either create periodic pauses in your pipeline (large n) or result in potentially poor shuffling (small n).

最好将数据转换为numpy数组，然后将numpy数组存储在磁盘上用作数据集，如下所示：

def array_to_tfrecords(X, y, output_file):
  feature = {
    'X': tf.train.Feature(float_list=tf.train.FloatList(value=X.flatten())),
    'y': tf.train.Feature(float_list=tf.train.FloatList(value=y.flatten()))
  }
  example = tf.train.Example(features=tf.train.Features(feature=feature))
  serialized = example.SerializeToString()

  writer = tf.python_io.TFRecordWriter(output_file)
  writer.write(serialized)
  writer.close()

这将使Dataset.from_generator组件脱离问题。然后可以通过以下方式读取数据：

^{pr2}$

这样可以确保数据被彻底洗牌，并产生更好的结果。在

另外，我相信您可以从一些数据预处理中受益。首先，尝试将数据集中的所有文件转换为标准波形，然后将数据保存到TFRecord。目前，您正在将它们转换成WAVE并用librosa标准化采样率，但这并不能使通道标准化。相反，请尝试使用以下函数：

from pydub import AudioSegment
def convert(path):

    #open file (supports all ffmpeg supported filetypes) 
    audio = AudioSegment.from_file(path, path.split('.')[-1].lower())

    #set to mono
    audio = audio.set_channels(1)

    #set to 44.1 KHz
    audio = audio.set_frame_rate(44100)

    #save as wav
    audio.export(path, format="wav")

最后，您可能会发现，将声音文件作为浮点读取并不符合您的最佳利益。你应该考虑尝试一下：

import scipy.io.wavfile as wave
import python_speech_features as psf
def getSpectrogram(path, winlen=0.025, winstep=0.01, NFFT=512):

    #open wav file
    (rate,sig) = wave.read(path)

    #get frames
    winfunc=lambda x:np.ones((x,))
    frames = psf.sigproc.framesig(sig, winlen*rate, winstep*rate, winfunc)

    #Magnitude Spectrogram
    magspec = np.rot90(psf.sigproc.magspec(frames, NFFT))

    #noise reduction (mean substract)
    magspec -= magspec.mean(axis=0)

    #normalize values between 0 and 1
    magspec -= magspec.min(axis=0)
    magspec /= magspec.max(axis=0)

    #show spec dimensions
    print magspec.shape    

    return magspec

然后应用如下函数：

#convert file if you need to
convert(filepath)

#get spectrogram
spec = getSpectrogram(filepath)

这将把WAVE文件中的数据解析为图像，然后可以像处理任何图像分类问题一样处理这些图像。在

网友

2楼 · 编辑于 2024-05-15 04:08:24

请试试这个：

将dataset.shuffle(buffer_size=1000)添加到输入管道。在
隔离对loss的调用，以便在每个训练阶段之后进行计算。在

如下图所示：

更新到输入管道

dataset = audio_dataset.padded_batch(5, padded_shapes=[None])
dataset = dataset.shuffle(buffer_size=1000)
iterator = tf.data.Iterator.from_structure(dataset.output_types,
                                           dataset.output_shapes)
dataset_init_op = iterator.make_initializer(dataset)
signals = iterator.get_next()

更新会话

^{pr2}$

如果我可以访问一些数据示例，我可能能够更准确地帮助您。现在，我在这里瞎工作，不管怎样，一定要让我知道这是否有效。在

更新到输入管道

更新会话

相关问题更多 >

编程相关推荐

热门问题

热门文章