CHENG Rui,CHEN Chong,HUANG Ruifeng,et al.Infrared Human Action Recognition based on Two-Stream Network and Transfer Learning[J].Journal of Chengdu University of Information Technology,2023,38(04):387-391.[doi:10.16836/j.cnki.jcuit.2023.04.002]
基于双流网络和迁移学习的红外人体行为识别
- Title:
- Infrared Human Action Recognition based on Two-Stream Network and Transfer Learning
- 文章编号:
- 2096-1618(2023)04-0387-05
- 分类号:
- TP391.4
- 文献标志码:
- A
- 摘要:
- 针对传统预训练模型无法充分利用红外人体行为数据时间信息的问题, 提出一种基于双流网络和迁移学习的红外人体行为识别方法。首先对原始视频提取运动历史图和光流图, 利用滑动窗口的思想进行堆叠处理; 其次根据可见光行为数据和红外行为数据之间的相似性, 设计双流预训练网络, 并通过迁移学习, 将可见光行为数据预训练的网络模型参数共享给红外人体行为识别网络模型, 以此提取红外行为数据的特征; 然后, 将提取的特征输入至双流网络中, 进一步提取红外人体行为信息, 在特征融合处采用并联特征融合方式替换Softmax融合方式; 最后, 使用支持向量机对融合的特征进行人体行为分类。实验结果表明, 所提方法在NTU RGB+D数据集上达到78.52%的准确率, 具有较好的分类效果。
- Abstract:
- Aiming at the problem that traditional pre-training models cannot make full use of the temporal information of infrared human action data, this paper proposes an infrared human action recognition method based on two-stream network and transfer learning. In this paper, firstly, the images of motion history and optical flow are extracted from the original video and stacked using asliding window; secondly, based on the similarity between visible action data and infrared action data, a two-stream pre-training network is designed and the parameters of the network model pre-trained with visible action data are shared with the infrared human action recognition network model through transfer learning, so as to extract the features of infrared action data.Then, the extracted features, which is the input of the two-stream network, are used to further extract infrared human action information, and Softmax fusion is replaced by parallel feature fusion in the feature fusion. The experimental results show that the proposed method achieves an accuracy of 78.52% on the NTU RGB+D dataset, which has a good classification effect.
参考文献/References:
[1] Li J J, Han Y, Zhang M, et al.Multi-scale residual network model combined with Global Average Pooling for action recognition[J].Multimedia Tools and Applications, 2022, 81(1):1375-1393.
[2] Simonyan K, Zisserman A.Two-Stream Convolutional Networks for Action Recognition in Videos[M].Ghahramani Z, Welling M, Cortes C, et al.Advances in Neural Information Processing Systems 27 Cambridge, USA:The MIT Press, 2014:568-576.
[3] Wang L M, Xiong Y J, Wang Z, et al.Temporal Segment Networks for Action Recognition in Videos[J].Ieee Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(11):2740-2755.
[4] Ye W, Cheng J, Yang F, et al.Two-Stream Convolutional Network for Improving Activity Recognition Using Convolutional Long Short-Term Memory Networks[J].Ieee Access, 2019, 7:67772-67780.
[5] Huang R, Chen C, Cheng R, et al.Human Action Recognition Based on Three-Stream Network with Frame Sequence Features[J].2022 7th International Conference on Image, Vision and Computing(ICIVC), 2022:37-44.
[6] Gao C, Du Y, Liu J, et al.Infar dataset: Infrared action recognition at different times[J].Neurocomputing, 2016, 212:36-47.
[7] Liu Y, Lu Z, Li J, et al.Global Temporal Representation Based CNNs for Infrared Action Recognition [J]. IEEE Signal Processing Letters, 2018, 25(6): 848-852.
[8] Romaissa B D, Mourad O, Brahim N.Vision-Based Multi-Modal Framework for Action Recognition[C].2020 25th International Conference on Pattern Recognition(ICPR), 2021: 5859-5866.
[9] Pan S J, Tsang I W, Kwok J T, et al.Domain adaptation via transfer component analysis[J].IEEE transactions on neural networks, 2010, 22(2):199-210.
[10] 邓茜文, 冯子亮, 邱晨鹏.基于近红外与可见光双目视觉的活体人脸检测方法[J].计算机应用, 2020, 40(7):2096-2103.
[11] Wu J, An Y Y, Shi Q W, et al.Behavior Recognition Algorithm Based on the Fusion of SE-R3D and LSTM Network[J].Ieee Access, 2021, 9:141002-141012.
[12] Hall D L, Llinas J. An introduction to multisensor data fusion[J].Proceedings of the IEEE, 1997, 85(1):6-23.
[13] Yang J, Yang J Y, Zhang D, et al.Feature fusion:parallel strategy vs.serial strategy[J].Pattern recognition, 2003, 36(6):1369-1381.
[14] Gu Y, Ye X, Sheng W, et al.Multiple stream deep learning model for human action recognition[J].Image and Vision Computing, 2020, 93:10381-1038.
[15] Arif S, Wang J, Siddiqul A A, et al.Bidirectional LSTM with saliency-aware 3D-CNN features for human action recognition[J].Journal of Engineering Research, 2021, 9(3A):115-133.
备注/Memo
收稿日期:2022-10-19
基金项目:国家自然科学基金资助项目(62001004); 安徽省高校省级自然科学研究项目(KJ2019A0768)
通信作者:陈冲.E-mail:shchshch@ustc.edu.cn