PDF下载分享

[1]胡高丽,文成玉.自然场景下交通标识文本检测与识别算法研究[J].成都信息工程大学学报,2022,37(02):171-176.[doi:10.16836/j.cnki.jcuit.2022.02.010]
　HU Gaoli,WEN Chengyu.Research on Algorithms for Text Detection and Recognition of Traffic Signs in Natural Scenes[J].Journal of Chengdu University of Information Technology,2022,37(02):171-176.[doi:10.16836/j.cnki.jcuit.2022.02.010]

点击复制

自然场景下交通标识文本检测与识别算法研究

成都信息工程大学学报[ISSN:1006-6977/CN:61-1281/TN] 卷: 37 期数: 2022年02期页码: 171-176 栏目: 电子信息科学与技术出版日期: 2022-04-30

Title:: Research on Algorithms for Text Detection and Recognition of Traffic Signs in Natural Scenes

文章编号:: 2096-1618(2022)02-0171-06

作者:: 胡高丽; 文成玉; (成都信息工程大学通信工程学院,四川成都 610225)

Author(s):: HU Gaoli; WEN Chengyu; (College of Communication Engineering, Chengdu University of Information Technology, Chengdu 610225, China)

关键词:: PSENet; CRNN; 交通文本; 文本检测; 字符识别

Keywords:: PSENet; CRNN; traffic text; text detection; character recognition

分类号:: TP391.4

DOI:: 10.16836/j.cnki.jcuit.2022.02.010

文献标志码:: A

摘要:: 针对自然场景下交通标志牌文本粘连、字体复杂、大小形状不一、难以分行,导致交通文本标识率低的问题,提出一种基于PSENet+CRNN的改进交通文本检测识别算法。检测算法以PSENet为基础网络,采用特征增强模块FEM来增加模型的接受域,并改进空洞卷积的特征金字塔模型来增强多支路深层语义信息的融合能力。文本识别部分在CRNN模型中采用CTC+CenterLoss实现功能和标签的对齐、解决预测重复、预测漏字时的对齐问题。最终在CTST-1600数据集上进行验证,检测准确率达到了92.5%,字符识别率达到了88.9%,与原算法相比,分别提升了识别率4.3%和2.3%。实验结果表明,该方法有效提升了模型的检测与识别精度。

Abstract:: Aiming at the problem of low identification rate of traffic text due to the adhesion of traffic sign text, complex fonts, different sizes and shapes, and difficulty in branching in natural scenes, an improved traffic text detection and recognition algorithm based on PSENet+CRNN is proposed. The detection algorithm uses PSENet as the basic network, uses the feature enhancement module FEM to increase the acceptance domain of the model, and improves the feature pyramid model of the hollow convolution to enhance the fusion ability of multi-branch deep semantic information. The text recognition part uses CTC+CenterLoss in the CRNN model to realize the alignment of functions and labels, and solve the problem of predicting repetition and alignment when predicting missing characters. Finally, it was verified on the CTST-1600 data set. The detection accuracy rate reached 92.5%, and the character recognition rate reached 88.9%. Compared with the original algorithm, the recognition rate increased by 4.3% and 2.3%, respectively. Experimental results show that this method effectively improves the accuracy of model detection and recognition.

参考文献/References:

[1] 李益红,陈袁宇.深度学习场景文本检测方法综述[J].计算机工程与应用,2021,57(6):42-48.
[2] 师广琛,巫义锐.像素聚合和特征增强的任意形状场景文本检测[J].中国图象图形学报,2021,26(7):1614-1624.
[3] RedmonJ, DivvalaS,GirshickR,et al.You Only Look Once: Unified,Real-Time Object Detection[J].Computer Vision & Pattern Recognition,2016:779-788.
[4] Tian Z,Huang W,He T,et al.Detecting text in natural image with connectionist text proposal network[C].European conference on computer vision.Springer,Cham,2016:56-72.
[5] Zhou X,Yao C,Wen H,et al.EAST:an efficient and accurate scene text detector[C].Proceedings of the IEEE Conference on Computer Vision and Pattern Recogition,2017:5551-5560.
[6] Li X,Wang W,Hou w,et al.Shape Robust Text Detection with Progressive Scale Expansion Network[J].Computer Science Computer’ Vision and Pattern Recognition,2018:1-12.
[7] Shi B,Bai X,Yao C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J].IEEE transactions on pattern analysis and machine intelligence,2016,39(11):2298-2304.
[8] Graves A,Fernández S,Gomez F,et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C].Proceedings of the 23rd international conference on Machine learning.2006:369-376.
[9] Wen Y,Zhang K,Li Z,et al.A Discriminative Feature Learning Approach for Deep Face Recognition[C].European Conference on Computer Vision.Springer,Cham,2016:499-515.
[10] He K,Zhang X,Ren S,et al.Deep residual learning for image recognition[C].Proceedings of the IEEE conference on computer vision and pattern recognition.2016:770-778.
[11] Yu F,Koltun V.Multi-Scale Context Aggregation by Dilated Convolutions[EB/OL].arXiv preprint arXiv:1511.07122,2015.
[12] Lin T Y,Dollár P,Girshick R,et al.Feature pyramid networks for object detection[C].Proceedings of the IEEE conference on computer vision and pattern recognition,2017:2117-2125.
[13] Szegedy C,Liu W,Jia Y,et al.Going deeper withconvolutions[C].Proceedings of the IEEE conference on computer vision and pattern recognition.2015:1-9.
[14] Graves A,Mohamed A,Hinton G. Speech recognition with deep recurrent neural networks[C].2013 IEEE international conference on acoustics, speech and signal processing.Ieee,2013:6645-6649.
[15] He X,Wang R,Li X,et al.HTSTL: Head-and-Tail search network with scale-transfer layer for traffic sign text detection[J].IEEE Access,2019,7:118333-118342.

备注/Memo

收稿日期:2021-09-02

更新日期/Last Update: 2022-04-30