SUN Wenhui,XIA Xiuyu,LU Xiong.Vocal Main Melody Extraction based on Neural Network of Sparse Autoencoder[J].Journal of Chengdu University of Information Technology,2020,35(04):373-377.[doi:10.16836/j.cnki.jcuit.2020.04.002]
基于稀疏自编码神经网络的声乐主旋律提取
- Title:
- Vocal Main Melody Extraction based on Neural Network of Sparse Autoencoder
- 文章编号:
- 2096-1618(2020)04-0373-05
- Keywords:
- main melody extraction; pitch saliency; sparse autoencoder; fundamental frequency discrimination
- 分类号:
- TP391
- 文献标志码:
- A
- 摘要:
- 旋律是音乐中最重要的要素,音乐主旋律提取是音乐检索的核心技术之一。复调音乐中歌声的音高序列构成了声乐主旋律。提出一种声乐主旋律自动提取改进算法,根据声乐信号的频谱特点,改进音高显著度函数的计算方法,降低计算复杂度,减少声乐主旋律提取时间。改用性能更优的稀疏自编码神经网络替代原算法的浅层BP神经网络作为基频判别模型,提高主旋律模型的识别准确率,降低旋律定位虚警率,从而提高声乐主旋律提取整体的准确率。在MIR-1K数据集上进行的实验表明,改进算法提取的声乐主旋律整体准确率比原算法至少提高了1.51%,提取主旋律的平均提取时间要比原算法减少大约0.12 s
- Abstract:
- Melody is the most important element in music, and the main melody extraction is one of the core technologies of music retrieval. The pitch sequence of singing voice in polyphonic music constitutes the main theme of vocal music. This paper presents an improved algorithm for automatic extraction of vocal main melody. Firstly, according to the spectrum characteristics of the vocal signal, the calculation method of the pitch saliency function is improved, and the complexity of calculation and the time of vocal main melody extraction are reduced. Secondly, instead of the shallow BP neural network, the sparse self-encoding neural network with better performance is used as the fundamental frequency discrimination model, which improves the recognition accuracy of the main melody model and reduces the false alarm rate of the melody, thus it improves the overall accuracy of the vocal main melody extraction rate. Experiments conducted on the MIR-1K dataset show that the overall accuracy of the vocal themes extracted by the algorithm is at least 1.51% higher than that by the original algorithm, and the average extraction time of the improved algorithm is about 0.12 s less than that of the original algorithm
参考文献/References:
[1] Poliner GE,Ellis DP,Ehmann AF,et al.Melody tran-scription from music audio:approaches and evaluation[J].IEEE Transactions on Audi,Speech,and Language Processing,2007,15(4):1247-1256.
[2] 李重光.基本乐理通用教材[M].北京:高等教育出版社,2004.
[3] 陆雄,夏秀渝,蔡良,等.声乐主旋律的自动提取[J].太赫兹科学与电子信息学报,2019(3):482-488.
[4] 仲志丹,樊浩杰,李鹏辉.基于稀疏自编码神经网络的抽油机井故障诊断[J].西安科技大学学报,2018,38(4):669-675.
[5] 王勇,赵俭辉,章登义,等.基于稀疏自编码深度神经网络的林火图像分类[J].计算机工程与应用,2014,50(24):173-177.
[6] 孙卫国,夏秀渝,乔立能,等.面向音频检索的音频分割和标注研究[J].微型机与应用,2017(5):38-41.
[7] 宋知用.MATLAB在语音信号分析与合成中的应用[M].北京:北京航空航天大学出版社,2013.
[8] Gonzalez S,Brookes M.A pitch estimation filter robust to high levels of noise(PEFAC)[C].2011 19th European Signal Processing Conference. Barcelona,Spain:IEEE,2011:451-455.
[9] Gonzalez S,Brookes M. Pefac-a pitch estimation algorithm robust to high levels of noise[J]. IEEE/ACM Transactions on Audio Speech & Language Processing,2014,22(2):518-530.
[10] 韩纪庆,郑铁然,郑贵滨.音频信息检索理论与技术[M].北京:科学出版社,2011.
[11] 黄尚晴,赵志勇,孙立波.BP神经网络算法改进[J].科技创新导报,2017,14(20):146-147.
[12] 朱啸天,张艳珠,王凡迪.一种基于稀疏自编码网络的数据降维方法研究[J].沈阳理工大学学报,2016,35(5):39-43.
[13] 刘加,张卫强.数字语音处理理论与应用[M].北京:电子工业出版社,2016.
[14] 韩纪庆,张磊,郑铁然.语音信号处理[M].北京:清华大学出版社,2004.
[15] Hsu C L,Jang J S R.MIR-1K Dataset[EB/OL].http://sites.google.com/site/unvoicedsoundseperation/mir-1k.2009.7.22,2018-01-10.
[16] Salamon J,Gomez E,Ellis D P W,et al.Melody Extraction from Polyphonic Music Signals:Approaches,applications,and challenges[J].Signal Processing Magazine IEEE,2014,31(2):118-134.
备注/Memo
收稿日期:2019-11-22