CHEN Hongsong,AN Junxiu,TAO Quanhui,et al.Multi-modal Sentiment Analysis Model based on BERT-VGG16[J].Journal of Chengdu University of Information Technology,2022,37(04):379-385.[doi:10.16836/j.cnki.jcuit.2022.04.003]
基于BERT-VGG16的多模态情感分析模型
- Title:
- Multi-modal Sentiment Analysis Model based on BERT-VGG16
- 文章编号:
- 2096-1618(2022)04-0379-07
- 关键词:
- 情感分析; 多模态; BERT-VGG16模型; 注意力机制
- Keywords:
- sentiment analysis; multi-modal; BERT-VGG16 model; attention mechanism
- 分类号:
- TP391.1
- 文献标志码:
- A
- 摘要:
- 针对传统情感分析方法只采用文本数据无法充分挖掘情感信息,且单模态数据包含的信息量有限,不能很好反映真实情感状态等问题,提出一种引入注意力机制的多模态情感分析模型。首先,该模型使用预训练模型BERT和VGG16分别从文本数据和图像数据中提取特征。其次,为提高各模态重要特征权重,特征融合时引入注意力机制,融合后的模型可大幅提升数据信息量。实验结果表明,使用基于BERT-VGG16引入注意力机制的多模态特征融合模型比单模态和其他多模态特征融合模型在情感分析效果上有显著提升。
- Abstract:
- Aiming at the problems that traditional sentiment analysis methods using only text data cannot fully dig out sentiment information, and the amount of information contained in monomodal data is limited, and cannot reflect the true emotional state well, a multi-modal sentiment analysis that introduces an attention mechanism is proposed model.First, the model uses pre-training models BERT and VGG16 to extract features from text data and image data, respectively. Secondly, in order to increase the weight of important features of each modal, the attention mechanism is introduced during feature fusion, and the fusion model can greatly increase the amount of data information. Experiment results show that the use of a multi-modal feature fusion model based on the BERT-VGG16 that introduces the attention mechanism has a significant improvement in the effect of sentiment analysis compared to single-modal and other multi-modal feature fusion models.
参考文献/References:
[1] Soleymani M,Garcia D,Jou B,et al.A survey of multimodal sentiment analysis[J].Image and Vision Computing,2017,65:3-14.
[2] Rao Y,Lei J,Wenyin L,et al.Building emotional dictionary for sentiment analysis of online news[J].World Wide Web,2014,17(4):723-42.
[3] Zhang L J,Wang S,LIU B.Deep learning for sentiment analysis:A survey[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2018,8.
[4] Zhang Y,Wallace B.A sensitivity analysis of(and practitioners’guide to)convolutional neural networks for sentence classification[J].arXiv preprint arXiv:1510.03820,2015.
[5] Zhou C,Sun C,Liu Z,et al.A C-LSTM neural network for text classification[J].arXiv preprint arXiv:1511.08630,2015.
[6] Ahmed K,Keskar N S,Socher R.Weighted transformer network for machine translation[J].arXiv preprint arXiv:1711.02132,2017.
[7] Yang K,Xu H,Gao K.CM-BERT:Cross-Modal BERT for Text-Audio Sentiment Analysis[C].Proceedings of the 28th ACM International Conference on Multimedia.2020:521-528.
[8] 陈鹏,李擎,张德政,等.多模态学习方法综述[J].工程科学学报,2020,42(5):557-569.
[9] Peng Z J,Lu Y,Pan S,et al.Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention[J].ICASSP 2021-2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2021:3020-3024.
[10] Lopes V,Gaspar A,Alexandre L A,et al.An AutoML-based Approach to Multimodal Image Sentiment Analysis[J].2021 International Joint Conference on Neural Networks(IJCNN),2021:1-9.
[11] Chen M,Li X.SWAFN:Sentimental Words Aware Fusion Network for Multimodal Sentiment Analysis[C].Proceedings of the 28th International Conference on Computational Linguistics.2020:1067-1077.
[12] 林敏鸿,蒙祖强.基于注意力神经网络的多模态情感分析[J].计算机科学,2020,47(S2):508-514.
[13] Bengio Y,Ducharme R,Vincent P,et al.A neural probabilistic language model[J].The journal of machine learning research,2003,3:1137-1155.
[14] Mikolov T,Chen K,Corrado G,et al.Efficient estimation of word representations in vector space[J].arXiv preprint arXiv:1301.3781,2013.
[15] Pennington J,Socher R,Manning C D.Glove: Global vectors for word representation[C].Proceedings of the 2014 conference on empirical methods in natural language processing(EMNLP).2014:1532-1543.
[16] Peters M E,Neumann M,Logan IV R L,et al.Knowledge enhanced contextual word representations[J].arXiv preprint arXiv:1909.04164,2019.
[17] Radford A,Narasimhan K,Salimans T,et al. Improving language understanding by generative pre-training[J].2018.
[18] Devlin J,Chang M W,Lee K,et al.Bert: Pre-training of deep bidirectional transformers for language understanding[J].arXiv preprint arXiv:1810.04805,2018.
[19] Krizhevsky A,Sutskever I,Hinton G E.Imagenet classification with deep convolutional neural networks[J].Advances in neural information processing systems,2012,25:1097-105.
[20] Simonyan K,Zisserman A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].CoRR,2015,abs/1409.1556.
[21] Szegedy C,Liu W,Jia Y,et al.Going deeper with convolutions[C].2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2015:1-9.
[22] He K,Zhang X,Ren S,et al.Deep Residual Learning for Image Recognition[C].2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2016:770-778.
[23] Zamir A R,Sax A,Shen W,et al.Taskonomy: Disentangling task transfer learning[C].Proceedings of the IEEE conference on computer vision and pattern recognition.2018:3712-3722.
[24] 任欢,王旭光.注意力机制综述[J].计算机应用,2021,41:1-6.
[25] Mnih V,Heess N,Graves A.Recurrent models of visual attention[C].Advances in neural information processing systems.2014:2204-2212.
[26] Bahdanau D,Cho K,Bengio Y.Neural Machine Translation by Jointly Learning to Align and Translate[J].CoRR,abs/1409.0473,2015.
[27] Yang Z,Yang D,Dyer C,et al.Hierarchical attention networks for document classification[C].Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics:human language technologies.2016:1480-1489.
[28] Vaswani A,Shazeer N,Parmar N,et al.Attention is all you need[C].Advances in neural information processing systems.2017:5998-6008.
[29] Fu J,Zheng H,Mei T.Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition[C].Proceedings of the IEEE conference on computer vision and pattern recognition.2017:4438-4446.
[30] Lee N,Cichocki A.Fundamental tensor operations for large-scale data analysis using tensor network formats[J].Multidimensional Systems and Signal Processing,2018,29(3):921-960.
相似文献/References:
[1]杨 頔,文成玉.结合关联规则的情感分析模型研究[J].成都信息工程大学学报,2019,(05):501.[doi:10.16836/j.cnki.jcuit.2019.05.011]
YANG Di,WEN Chengyu.Research on Emotional Analysis Model based on Association Rules[J].Journal of Chengdu University of Information Technology,2019,(04):501.[doi:10.16836/j.cnki.jcuit.2019.05.011]
[2]张碧依,陶宏才.基于XLNet-BiLSTM模型的中文影评情感分析[J].成都信息工程大学学报,2021,36(03):264.[doi:10.16836/j.cnki.jcuit.2021.03.004]
ZHANG Biyi,TAO Hongcai.Sentiment Analysis of Chinese Film Review based on XLNet-BiLSTM Model[J].Journal of Chengdu University of Information Technology,2021,36(04):264.[doi:10.16836/j.cnki.jcuit.2021.03.004]
备注/Memo
收稿日期:2021-12-20
基金项目:国家自然科学基金资助项目(71673032)