WANG Wenhua,XIA Xiuyu.Research and Application of Robust Characteristics of Auditory Models[J].Journal of Chengdu University of Information Technology,2024,39(03):275-282.[doi:10.16836/j.cnki.jcuit.2024.03.003]
听觉模型鲁棒性特征研究及应用
- Title:
- Research and Application of Robust Characteristics of Auditory Models
- 文章编号:
- 2096-1618(2024)03-0275-08
- 分类号:
- TP391.4
- 文献标志码:
- A
- 摘要:
- 人类的听觉系统具有非常精细而巧妙的结构,即使在嘈杂的环境中,也能准确地理解语音。采用精细的耳蜗模型作为前端处理可以实现更好的语音处理。利用快速压缩的非对称谐振器级联(CARFAC)作为人耳外周模型,结合听觉稳定图像得到精确的皮层前听觉模型。在听觉模型的基础上提取较准确的基音轮廓,利用基音信息进行声场景分析,合成鲁棒性语音特征,并将其送入神经网络进行监督训练,以实现语音增强。实验结果表明,噪声条件下,由听觉模型提取的特征在各语音评价指标下都有较好的体现,可以更好表征语音信号,具有一定的鲁棒性。
- Abstract:
- The human auditory system has a very fine and ingenious structure,and it can accurately understand speech even in a noisy environment. Using a fine cochlea model as front-end processing allows for better speech processing. In this paper, a rapidly compressed asymmetric resonator cascade(CARFAC)is used as a peripheral model of the human ear, combined with an auditory stabilization image(SAI)to obtain an accurate precortical auditory model. Based on the auditory model, a more accurate pitch contour is extracted, the pitch information is used to analyze the acoustic scene, and robust speech features are synthesized, which are sent to the neural network for supervised training to achieve speech enhancement. Experiments show that under noise conditions, the features extracted by the auditory model are better reflected in various speech evaluation indicators, which can better characterize the speech signal and have certain robustness.
参考文献/References:
[1] Das N,Chakraborty S,Chaki J,et al.Fundamentals,present and future perspectives of speech enhancement[J].International Journal of Speech Technology,2021,24:883-901.
[2] Computational Auditory Scene Analysis:Proceedings of the Ijcai-95 Workshop[M].CRC press,2021.
[3] Wang D L,Chen J.Supervised speech separation based on deep learning:An overview[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2018,26(10):1702-1726.
[4] 孙林慧,王灿,梁文清,等.基于深度学习特征融合和联合约束的单通道语音分离方法[J].电子与信息学报,2022,44(9):1-11.
[5] Chen J,Wang D L.Long short-term memory for speaker generalization in supervised speech separation[J].The Journal of the Acoustical Society of America,2017,141(6):4705-4714.
[6] Xu Y,Afshar S,Singh R K,et al.A binaural sound localization system using deep convolutional neural networks[C].2019 IEEE International Symposium on Circuits and Systems(ISCAS)IEEE,2019:1-5.
[7] Lyon R F,Ponte J,Chechik G. Sparse coding of auditory features for machine hearing in interference[C].2011 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP)IEEE,2011:5876-5879.
[8] Lyon R F.Human and machine hearing:extracting meaning from sound[M].Cambridge University Press,2017.
[9] Virtanen T,Plumbley M D,Ellis D.Computational analysis of sound scenes and events[M].Cham:Springer International Publishing,2018.
[10] Saremi A,Beutelmann R,Dietz M,et al.A comparative study of seven human cochlear filter models[J].The Journal of the Acoustical Society of America,2016,140(3):1618-1634.
[11] Islam M A,Xu Y,Monk T,et al.Noise-robust text-dependent speaker identification using cochlear models[J].The Journal of the Acoustical Society of America,2022,151(1):500-516.
[12] Peng Z,Dang J,Unoki M,et al.Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech[J].Neural Networks,2021,140:261-273.
[13] Yu Y,Si X,Hu C,et al.A review of recurrent neural networks:LSTM cells and network architectures[J].Neural computation,2019,31(7):1235-1270.
[14] Kolbæk M,Tan Z H,Jensen S H,et al.On loss functions for supervised monaural time-domain speech enhancement[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2020,28:825-838.
[15] Wang Y,Narayanan A,Wang D L.On training targets for supervised speech separation[J].IEEE/ACM transactions on audio,speech,and language processing,2014,22(12):1849-1858.
[16] 张涛,任相赢,刘阳,等.基于自编码特征的语音增强声学特征提取[J].计算机科学与探索,2019,13(8):1341.
备注/Memo
收稿日期:2023-06-11
通信作者:夏秀渝.E-mail:xiaxxy@163.com