PDF下载 分享
[1]陈诗汉,马洪江,王 婷,等.基于多模态融合的视频情感分析技术[J].成都信息工程大学学报,2022,37(06):656-661.[doi:10.16836/j.cnki.jcuit.2022.06.007]
 CHEN Shihan,MA Hongjiang,WANG Ting,et al.Video Sentiment Analysis Technology based on Multimodal Fusion[J].Journal of Chengdu University of Information Technology,2022,37(06):656-661.[doi:10.16836/j.cnki.jcuit.2022.06.007]
点击复制

基于多模态融合的视频情感分析技术

参考文献/References:

[1] 奚晨.基于表情、语音和文本的多模态情感分析[D].南京:南京邮电大学,2021.
[2] 王蝶.基于注意力机制的多模态融合技术研究[D].南京:南京师范大学,2021.
[3] 冯亚琴,沈凌洁,胡婷婷,等.利用语音与文本特征融合改善语音情感识别[J].数据采集与处理,2019,34(4):625-631.
[4] 秦放,曾维佳,罗佳伟,等.基于深度学习的多模态融合图像识别研究[J].信息技术,2022(4):29-34.
[5] 牟智佳,符雅茹.多模态学习分析研究综述[J].现代教育技术,2021,31(6):23-31.
[6] Zadeh A,Chen M,Poria S,et al.Tensor FusionNetwork for Multimodal Sentiment Analysis[C].Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing,2017:1103-1114.
[7] 薛其威,伍锡如.基于多模态特征融合的无人驾驶系统车辆检测[J].广西师范大学学报(自然科学版),2022,40(2):37-48.
[8] Sun Z,Sarma P,Sethares W,et al.Learning relationships between text,audio,and video via deep canonical correlation for multimodal language analysis[C].Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(5):8992-8999.
[9] 颜增显,孔超,欧卫华.基于多模态融合的人脸反欺骗算法研究[J].计算机技术与发展,2022,32(4):63-68.
[10] 王旭阳,董帅,石杰.复合层次融合的多模态情感分析[J/OL].http://kns.cnki.net/kcms/detail/11.5602.TP.20220331.1739.003.html,2022(8):31.
[11] Tsai Y H,Bai S,Kolter J Z,et al.Multimodal Transformer for Unaligned Multimodal Language Sequences[C].Proceedings of the conference.Association for ComputationalLinguistics,2019,2019:6558-6569.
[12] Hazarika D,Zimmermann R,Poria S.Misa: Modality-invariant and-specific representations for multimodal sentiment analysis[C].Proceedings of the 28th ACM international conference on multimedia,2020:1122-1131.
[13] Makiuchi M R,Uto K,Shinoda K.Multimodal emotion recognition with high-level speech and text features[C].2021 IEEE Automatic Speech Recognition and Understanding Workshop(ASRU).IEEE,2021:350-357.
[14] Byun S W,Kim J H,Lee S P.Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding[J].Applied Sciences,2021,11(17):7967.
[15] 黄欢,孙力娟,曹莹,等.基于注意力的短视频多模态情感分析[J].图学学报,2021,42(1):8-14.
[16] Poole B,Ozair S,Van Den Oord A,et al.On variational bounds of mutual information[C].International Conference on Machine Learning.PMLR,2019:5171-5180.
[17] Belghazi M I,Baratin A,Rajeshwar S,et al.Mutual information neural estimation[C].International conference on machine learning.PMLR,2018:531-540.
[18] Devlin J,Chang M W,Lee K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C].Proceedings of NAACL-HLT,2019:4171-4186.
[19] Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780.
[20] Tishby N,Zaslavsky N.Deep learning and the information bottleneck principle[C].2015 ieee information theory workshop.IEEE,2015:1-5.
[21] Alemi A A,Fischer I,Dillon J V,et al.Deep Variational Information Bottleneck[J].arXiv e-prints,2016:arXiv:1612.00410,2016.
[22] Bachman P,Hjelm R D,Buchwalter W.Learning representations by maximizing mutual information across views[C].Proceedings of the 33rd International Conference on Neural Information Processing Systems,2019:15535-15545.
[23] Barber D,Agakov F.The IM algorithm:a variational approach to Information Maximization[C].Proceedings of the 16th International Conference on Neural Information Processing Systems,2003:201-208.
[24] Huber M F,Bailey T,Durrant-Whyte H,et al.On entropy approximation for Gaussian mixture random vectors[C].2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.IEEE,2008:181-188.
[25] Gutmann M,Hyvärinen A.Noise-contrastive estimation:A new estimation principle for unnormalized statistical models[C].Proceedings of the thirteenth international conference on artificial intelligence and statistics.JMLR Workshop and Conference Proceedings,2010:297-304.
[26] Zadeh A A B,Liang P P,Poria S,et al.Multimodal language analysis in the wild:Cmu-mosei dataset and interpretable dynamic fusion graph[C].Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018:2236-2246.
[27] Yuan J,Liberman M.Speaker identification on the SCOTUS corpus[J].The Journal of the Acoustical Society of America,2008,123(5):3878-3878.
[28] Degottex G,Kane J,Drugman T,et al.COVAREP—A collaborative voice analysis repository for speech technologies[C].2014 ieee international conference on acoustics,speech and signal processing(icassp).IEEE,2014:960-964.
[29] Yu W,Xu H,Yuan Z,et al.Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C].Proceedings of the AAAI Conference on Artificial Intelligence,2021,35(12):10790-10797.
[30] Liu Z,Shen Y,Lakshminarasimhan V B,et al.Efficient Low-rank Multimodal Fusion With Modality-Specific Factors[C].Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers),2018:2247-2256.

相似文献/References:

[1]陶全桧,安俊秀,陈宏松.基于跨模态融合ERNIE的多模态情感分析研究[J].成都信息工程大学学报,2022,37(05):501.[doi:10.16836/j.cnki.jcuit.2022.05.003]
 TAO Quanhui,AN Junxiu,CHEN Hongsong.Multi-modal Sentiment Analysis based on Cross-modal Fusion ERNIE[J].Journal of Chengdu University of Information Technology,2022,37(06):501.[doi:10.16836/j.cnki.jcuit.2022.05.003]

备注/Memo

收稿日期:2022-07-19
基金项目:四川省科技厅重点研发资助项目(2021YFG0031、2022YFG0375); 四川省科技服务业示范资助项目(2021GFW130)

更新日期/Last Update: 2022-12-30