Enhancing Neural Vocoders with Fourier Transform: A Frequency-Domain Approach to Improved Speech Synthesis

Authors

  • Boxun An Author

DOI:

https://doi.org/10.61173/h43m4449

Keywords:

Neural vocoders, Fourier Transform, Speech synthesis, Noise reduction

Abstract

This paper introduces a frequency-domain approach to enhance neural vocoders, addressing limitations in capturing high-frequency details essential for natural and clear speech synthesis. By integrating Short-Time Fourier Transform preprocessing, the method provides two key benefits. It offers a richer, frequency-detailed input, enabling the vocoder to capture finer spectral elements for improved synthesis quality. It also facilitates targeted noise reduction, refining output clarity. Additionally, frequency-domain enhancements during generation allow selective amplification of key frequencies (e.g., 3–6 kHz). The integration of these techniques not only significantly improves the clarity and naturalness of synthesized speech but also reduces artifacts that commonly affect high-frequency content. By leveraging both traditional signal processing methods and deep learning, this framework enhances the vocoder‘s ability to accurately reproduce challenging speech spectra, providing a balanced approach that can generalize across different acoustic environments. Combining Fourier-based processing with neural networks, this approach pushes the boundaries of vocoder quality, improving both naturalness and intelligibility. These advancements set a new standard in speech synthesis, offering broader applications in audio processing.

Downloads

Published

2024-12-31

Issue

Section

Articles