CAI Liang,XIA Xiuyu,LU Xiong,et al.Research on Speech Enhancement based on Pitch Tracking[J].Journal of Chengdu University of Information Technology,2019,(01):1-6.[doi:10.16836/j.cnki.jcuit.2019.01.001]
基于基音跟踪的语音增强研究
- Title:
- Research on Speech Enhancement based on Pitch Tracking
- 文章编号:
- 2096-1618(2019)01-0001-06
- 分类号:
- TN912.35
- 文献标志码:
- A
- 摘要:
- 在移动通信、语音识别、基于语音的语音交互等领域,采集的语音信号往往混杂具有谐波结构的噪声,因此语音增强都有非常重要的应用价值。语音的能量大部分集中在浊音段,浊音具有谐波结构。基于实际混合声音在时频域具有近似稀疏性特点,提出一种基于基音跟踪的语音增强算法,利用基音特征尽可能地恢复语音的谐波结构同时抑制噪声信号能量来达到提升语音信噪比的目的。首先对混合声音流进行切分、浊音段提取,接着对浊音段信号进行多基频提取,并利用维特比解码找出主导基频,使用BP神经网络对主导基频进行是否人声基频的判别,最后利用梳齿滤波器重构浊音段语音或抑制干扰音。仿真实验表明,算法能够从混有音乐和背景噪声的混合音频中提取语音,语音信噪比增益平均达8 dB。
- Abstract:
- In the fields of mobile communication, speech recognition and voice-based voice interaction, etc., the collected speech signals are often mixed with noise with harmonic structure, so speech enhancement has very important application value.Most of the speech energy is concentrated in the voiced segment, and the voiced speech has a harmonic structure. Based on the fact that the actual mixed-sound shows approximate sparse characteristics in time-frequency domain, this paper proposes a speech enhancement algorithm based on pitch tracking, which use the pitch feature to restore the harmonic structure of the speech as much as possible while suppressing the noise signal energy to achieve the purpose of improving the speech signal to noise ratio. Firstly, the mixed sound stream issegmented and the voiced segmentis extracted. Then, the multi-pitch extraction is performed on the voiced segment signal. The dominant pitch is found through Viterbi decoding, and the BP neural network is used to discriminate whether the dominant pitch is vocal pitch. Lastly,The comb-tooth filter is used to reconstruct the speech in the voiced segment or to suppress the interference. The experimental results showed that the algorithm successes extracting speech from mixed-audio which is mixed with music and background noise, and the ratio of speech signal to noise gains 8dB in average.
参考文献/References:
[1] 王红.低信噪比场景下语音增强算法的研究[D].合肥:安徽大学,2017.
[2] 胡定禹,郁文贤,江文斌.基于谐波重建的语音增强算法的研究[J].信息技术,2017(11).
[3] Gonzalez S,Brookes M.Pefac-a pitch estimation algorithm robust to high levels of noise[J].IEEE/ACM Transactions on Audio Speech & Language Processing,2014,22(2),518-530.
[4] 宋知用.MATLAB在语音信号分析与合成中的应用[M].北京:北京航空航天大学出版社,2013.
[5] 孙彦楠,夏秀渝.基于深度神经网络的关键词识别系统[J].计算机系统应用,2018,27(5):41-48.
[6] Gonzalez S,Brookes,M.PEFAC-A Pitch Estimation Algorithm Robust to High Levels of Noise[J].IEEE Press,2014.
[7] 韩纪庆.音频信息检索理论与技术[M].北京:科学出版社,2011.
[8] 吕菲,夏秀渝.基于方位特征的听觉选择性注意计算模型研究[J].自动化学报,2017,43(4):634-644.
[9] 胡定禹,郁文贤,江文斌.基于谐波重建的语音增强算法的研究[J].信息技术,2017,(11).
[10] Chao-Ling Hsu,Prof.Jyh-Shing Roger Jang.MIR-1KDataset[OL].http://sites.google.com/site/unvoicedsoundseperation/mir-1k,2009.
[11] 夏秀渝,何培宇.基于声源方位信息和非线性时频掩蔽的语音盲提取算法[J].声学学报,2013(2):224-230.
[12] 王雨,林家骏,袁文浩,等.基于改进基音跟踪算法的单通道语音分离[J].华东理工大学学报(自然科学版),2013,39(3):338-344.
备注/Memo
收稿日期:2018-05-08