'Audio resampling layer for tensorflow
It is required to resample audio signals within a custom model structure. This resampling task is not a kind of pre/post-processing operation that can be developed out of the model. In other words, this resampling is a section of model's internal design. Then, it is required to define the gradient operation for such a layer as well. For the resampling operation, it is going to employ tensorflow I/O:
The operation works perfectly and can be easily used as a pre/post-processing unit; however, its implementation a a custom layer being embedding within the model is challenging as I don't know how to implement the backward path.
- How the backward path should be implemented for such a 1D signal resampling layer?
- Is there any other open source 1D signal resampling layer that be employed?
P.S., I tried to employ conventional upsampling/pooling like layers, but not accurate enough comparing the tfio which implements other resampling methods like FFT-based.
To give more understanding, please have a look at: another question
Solution 1:[1]
You must tell the objective of re-samplings, it can be done in many ways including concluding sing signals then you can represent with smaller sizes of sine values.
By changing of the samplig rate you can save the DATA space 0.05 * tf.math.sin(audio[:5 * 22050]).numpy()
sec_1 = np.zeros((2750)) * tf.math.sin(audio[0:2750]).numpy() and
sec_2 = np.ones((2750)) * tf.math.sin(audio[2750:5500]).numpy()
[ Sample ]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
contents = tf.io.read_file("F:\\temp\\Python\\Speech\\temple_of_love-sisters_of_mercy.wav")
audio, sample_rate = tf.audio.decode_wav(
contents, desired_channels=-1, desired_samples=-1, name=None
)
print(audio)
print(sample_rate)
plt.plot(audio[:5 * 22050])
plt.show()
plt.close()
plt.plot(0.05 * tf.math.sin(audio[:5 * 22050]).numpy())
plt.show()
plt.close()
sec_1 = np.zeros((2750)) * tf.math.sin(audio[0:2750]).numpy()
sec_2 = np.ones((2750)) * tf.math.sin(audio[2750:5500]).numpy()
plt.plot(0.05 * tf.concat([sec_1, sec_2], 0).numpy())
plt.show()
plt.close()
[ Output ]:
array([[0.],
[0.],
[0.],
...,
[0.],
[0.],
[0.]], dtype=float32)>, sample_rate=<tf.Tensor: shape=(), dtype=int32, numpy=22050>)
tf.Tensor(22050, shape=(), dtype=int32)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Martijn Pieters |