REN Rui,WANG Xiaoya,WEN Chengyu.Improved Chinese Street View Text Recognition Technology based on CRNN[J].Journal of Chengdu University of Information Technology,2025,40(01):1-6.[doi:10.16836/j.cnki.jcuit.2025.01.001]
基于CRNN改进的中文街景文本识别技术
- Title:
- Improved Chinese Street View Text Recognition Technology based on CRNN
- 文章编号:
- 2096-1618(2025)01-0001-06
- Keywords:
- text recognition; convolutional neural network; attention mechanism; bi-directional long and short-term memory
- 分类号:
- TP391
- 文献标志码:
- A
- 摘要:
- 现实场景中存在图像扭曲、背景复杂、弯曲倾斜等不规则文字形状,提取其中的文字信息可提高图像的语义信息和帮助分析上下文,从而更好地理解场景图像。针对场景文本的复杂问题,提出基于CRNN(卷积循环神经网络)改进的端到端场景文本识别技术。在卷积网络层提取特征,基于GoogLeNet改进的inception结构,加入多分支卷积层对多尺度特征的融合,其次融入注意力机制,在通道维度和空间维度加强特征联系,使局部特征拥有全局性。在循环网络层采用Bi-LSTM(双向长短期记忆网络)加强字符之间的上下文联系进行序列预测,最后将预测序列传入CTC(时序分类层)进行转录后序列输出。在IIIT5K数据集和百度中文街景数据集上的实验结果表明,该方法分别获得了95.3%和91.1%的准确率,证明其可靠性。
- Abstract:
- In real-world scenarios, there are complexities such as image distortion, background clutter, bending, and tilting that can cause irregular text shapes. Extracting textual information from these images can enhance their semantic content and help analyze the context, thus better-facilitating understanding of the scene. To address these challenges in scene text recognition, an end-to-end text recognition technique based on CRNN(Convolutional Recurrent Neural Network)is proposed. In the convolutional network layer, an improved inception structure based on GoogLeNet is used to extract features. This structure incorporates multi-branch convolutional layers for the fusion of multi-scale features. Additionally, an attention mechanism is incorporated to enhance feature correlation in both the channel and spatial dimensions, giving local features a global perspective. In the recurrent network layer, Bi-LSTM(Bidirectional Long Short-Term Memory)is employed to strengthen the contextual relationships between characters for sequential prediction. Finally, the predicted sequence is fed into CTC(Connectionist Temporal Classification)for post-transcription sequence output. Experimental results on the IIIT5K dataset and Baidu’s Chinese Street View dataset demonstrate the reliability of this approach, with accuracy rates of 95.3% and 91.1% respectively.
参考文献/References:
[1] Mishra A,Alahari K,Jawahar C.Top-down and bottom-up cues for scene text recognition[C].Top-down and bottom-up cues for scene text recognition.2012 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2012:2687-2694.
[2] Shi B,Xiang B,Cong Y.An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition[J].Proceedings of IEEE Transactions on Pattern Analysis & Machine Intelligence,2016,39(11):2298-2304.
[3] Trinh Tan Dat,Le Tran Anh Dang,Nguyen Nhat Truong.An improved CRNN for Vietnamese Identity Card Information Recognition[J].COMPUTER SYSTEMS SCIENCE AND ENGINEERING.2022,40(2):539-555.
[4] 闫郁瑾.基于CRNN的自然场景文字识别算法研究[D].西安:西安电子科技大学,2020.
[5] 丁宇.基于深度学习的自然场景文字识别研究[D].青岛:山东科技大学,2020.
[6] Christian Szegedy,Wei Liu,Yangqing Jia.Going deeper with convolutions[C].2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2015:7-12.
[7] He Kaiming,Zhang Xiangyu,Ren.Shaoqing Deep Residual Learning for Image Recognition[C].CoRR.Volume.abs,2015:45-49.
[8] Vaswani A,Shazeer N,Parmar N,et al.Attention is all you need[C].Advances in neural information processing systems,2017:30-30.
[9] 薛晨兴,张军,邢家源.基于GoogLeNet Inception V3的迁移学习研究[J].无线电工程,2020,50(2):118-122.
[10] Szegedy C,Ioffe S,Vanhoucke V,et al.Inception-v4,Inception-ResNet and the Impact of Residual Connections on Learning[C].Proceedings of 31stAAAI Conference on Ar-tificial Intelligence,AAAI,2017:4278-4284.
[11] Hassan Ehtesham,Lekshmi V L.Attention Guided Feature Encoding for Scene Text Recognition[J].Journal of Imaging,2022,8(10):276-276.
[12] Kantipudi MVV Prasad,Kumar Sandeep,Kumar Jha Ashish.Scene Text Recognition Based on Bidirectional LSTM and Deep Neural Network[J].Computational Intelligence and Neuroscience,2021,11(5):13-15.
[13] 陈鹏,李鸣,张宇,等.一种端到端的自然场景文本检测与识别模型[J].测控技术,2022,41(7):17-22.
[14] Yousef Mohamed,Bishop Tom E.OrigamiNet:Weakly-supervised,segmentation-free,one-step,full page text recognition by learning to unfold[C].IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020:14710-14719.
[15] Zuo Lingqun,Sun Hongmei,Mao Qichao.Natural Scene Text Recognition Based on Encoder-Decoder Framework[J].IEEEAccess,2019,7:62616-62623.
[16] 吴启明,宋雨桐.基于YOLOv3与CRNN的自然场景文本识别[J].计算机工程与设计,2022,43(8):2352-2360.
[17] 熊炜,孙鹏,强观臣.基于注意力机制的自然场景图像中文本识别方法及系统[P].中国专利,202310120821.8,2023-2-13.
[18] Xin Tang,Yongquan Lai,Ying Liu.Visual-Semantic Transformer for Scene Text Recognition[J].arXiv preprint arXiv,2021,12(8):56-61.
相似文献/References:
[1]张 斌,王 强.一种改进型卷积神经网络的图像分类方法[J].成都信息工程大学学报,2019,(01):39.[doi:10.16836/j.cnki.jcuit.2019.01.009]
ZHANG Bin,WANG Qiang.An Improved Convolution Neural Network Image Classification Method[J].Journal of Chengdu University of Information Technology,2019,(01):39.[doi:10.16836/j.cnki.jcuit.2019.01.009]
[2]唐明轩,李孝杰,周激流.基于Dense Connected深度卷积神经网络的
自动视网膜血管分割方法[J].成都信息工程大学学报,2018,(05):525.[doi:10.16836/j.cnki.jcuit.2018.05.007
]
TANG Ming-xuan,LI Xiao-jie,ZHOU Ji-liu.Automatic Retinal Vascular Segmentation Method based on
Densely Connected Convolution Neural Network[J].Journal of Chengdu University of Information Technology,2018,(01):525.[doi:10.16836/j.cnki.jcuit.2018.05.007
]
[3]王 强,李孝杰,陈 俊.基于He-Net的卷积神经网络算法的图像分类研究[J].成都信息工程大学学报,2017,(05):503.[doi:10.16836/j.cnki.jcuit.2017.05.007]
WANG Qing,LI Xiao-jie,CHEN Jun.Research on Image Classification based on HE-Net Convolutional Neural Networks[J].Journal of Chengdu University of Information Technology,2017,(01):503.[doi:10.16836/j.cnki.jcuit.2017.05.007]
[4]黄 洁,王 燚.适用于侧信道分析的卷积神经网络结构的实验研究[J].成都信息工程大学学报,2019,(05):449.[doi:10.16836/j.cnki.jcuit.2019.05.001]
HUANG Jie,WANG Yi.Experimental Study on the Structure of Convolutional Neural Network Suitable for Side Channel Analysis[J].Journal of Chengdu University of Information Technology,2019,(01):449.[doi:10.16836/j.cnki.jcuit.2019.05.001]
[5]王文文,陶宏才.基于优化VGG19卷积神经网络的异常检测模型研究[J].成都信息工程大学学报,2020,35(03):253.[doi:10.16836/j.cnki.jcuit.2020.03.001]
WANG Wenwen,TAO Hongcai.Research on Anomaly Detection Model based on Optimized VGG19 Convolutional Neural Network[J].Journal of Chengdu University of Information Technology,2020,35(01):253.[doi:10.16836/j.cnki.jcuit.2020.03.001]
[6]曹远杰,高瑜翔,杜鑫昌,等.口罩佩戴识别中的Tiny-YOLOv3模型算法优化[J].成都信息工程大学学报,2021,36(02):154.[doi:10.16836/j.cnki.jcuit.2021.02.005]
CAOYuanjie,GAO Yuxiang,DU Xinchang,et al.Tiny-YOLOv3 Model Algorithm is Optimized for Mask Wearing Recognition[J].Journal of Chengdu University of Information Technology,2021,36(01):154.[doi:10.16836/j.cnki.jcuit.2021.02.005]
[7]唐康健,文 展,李文藻.基于卷积神经网络的垃圾图像分类模型研究应用[J].成都信息工程大学学报,2021,36(04):374.[doi:10.16836/j.cnki.jcuit.2021.04.004]
TANG Kangjian,WEN Zhan,LI Wenzao.Research and Application of Garbage Image Classification Model based on Convolutional Neural Network[J].Journal of Chengdu University of Information Technology,2021,36(01):374.[doi:10.16836/j.cnki.jcuit.2021.04.004]
[8]曹远杰,高瑜翔,刘海波,等.基于YOLOv4-Tiny模型剪枝算法[J].成都信息工程大学学报,2021,36(06):610.[doi:10.16836/j.cnki.jcuit.2021.06.005]
CAO Yuanjie,GAO Yuxiang,LIU Haibo,et al.Model Pruning Algorithm based on YOLOv4-Tiny[J].Journal of Chengdu University of Information Technology,2021,36(01):610.[doi:10.16836/j.cnki.jcuit.2021.06.005]
[9]蒲建飞,魏 维,吴帝勇,等.基于烟雾区域和轻量化模型的视频烟雾检测[J].成都信息工程大学学报,2023,38(03):281.[doi:10.16836/j.cnki.jcuit.2023.03.006]
PU Jianfei,WEI Wei,WU Diyong,et al.Video Smoke Detection based on Smoke Area and Lightweight Model[J].Journal of Chengdu University of Information Technology,2023,38(01):281.[doi:10.16836/j.cnki.jcuit.2023.03.006]
[10]詹鸿辉,程仲汉.基于卷积神经网络的异常流量鉴别方法[J].成都信息工程大学学报,2023,38(06):668.[doi:10.16836/j.cnki.jcuit.2023.06.008]
ZHAN Honghui,CHENG Zhonghan.Identification Method of Abnormal Traffic based on Convolution Neural Network[J].Journal of Chengdu University of Information Technology,2023,38(01):668.[doi:10.16836/j.cnki.jcuit.2023.06.008]
备注/Memo
收稿日期:2023-09-04
基金项目:四川省科技计划资助项目(2023YFS0422)