DENG Jun,WANG Min.Statement-level Software Vulnerability Detection Solution based on Transformer[J].Journal of Chengdu University of Information Technology,2025,40(04):428-433.[doi:10.16836/j.cnki.jcuit.2025.04.003]
基于Transformer的语句级软件漏洞检测方案
- Title:
- Statement-level Software Vulnerability Detection Solution based on Transformer
- 文章编号:
- 2096-1618(2025)04-0428-06
- 关键词:
- 软件漏洞检测; 深度学习; Transformer; 程序分析; 代码表示
- Keywords:
- software vulnerability detection; deep learning; transformer; program analysis; code representation
- 分类号:
- TP309
- 文献标志码:
- A
- 摘要:
- 软件漏洞检测对于保护系统免受网络安全攻击至关重要。当前大多数研究方法主要是基于各种机器学习和深度学习方法检测函数级、文件级的漏洞,不能检测出漏洞存在的具体语句。提出一种基于Transformer的语句级软件漏洞检测方案,使用代码预训练模型和数据流图进行代码表示,并与现有研究IVDetect、LineVD、SySeVR、Devign、VulDeePecker模型进行比较。此外,比较了不同代码嵌入方式对漏洞检测效果的影响。实验表明,方法在函数级漏洞检测的F1值提高了56%~65%,语句级漏洞检测的F1值提高了16%~35%,为白盒软件漏洞检测相关工作提供了一定研究思路。
- Abstract:
- Software vulnerability detection is crucial for protecting systems from network security attacks.Most current research methods are mainly based on various machine learning and deep learning methods to detect functional and file-level vulnerabilities,and cannot detect specific statements where vulnerabilities exist.This article proposes a statement-level software vulnerability detection scheme based on Transformer,which uses a pre-trained code model and data flow graph for code representation,and compares it with existing research models such as IVDetect,LineVD,SySeVR,Design,and VulDeePecker.In addition,the effects of different code embedding methods on vulnerability detection were compared.The experiment shows that the method proposed in this article improves the F1 value of function level vulnerability prediction by 56%-65%,and the F1 value of statement level vulnerability prediction by 16%-35%.This method provides certain research ideas for whitebox software vulnerability detection related work.
参考文献/References:
[1] 邓枭,叶蔚,谢睿,等.基于深度学习的源代码缺陷检测研究综述[J]. 软件学报,2023,34(2):625-654.
[2] Yamaguchi F,Golde N,Arp D,et al.Modeling and discovering vulnerabilities with code property graphs[C]. 2014 IEEE Symposium on Security and Privacy.IEEE,2014:590-604.
[3] Li Z,Zou D,Xu S,et al.Vuldeepecker:A deep learning-based system for vulnerability detection[J]. arXiv preprint arXiv:1801.01681,2018.
[4] Fu M,Tantithamthavorn C.Linevul:A transformer-based line-level vulnerability prediction[C]. Proceedings of the 19th International Conference on Mining Software Repositories.2022:608-620.
[5] Feng Z,Guo D,Tang D,et al.Codebert:A pre-trained model for programming and natural languages[J]. Findings of the Association for Computational Linguistics.EMNLP 2020:1536-1547.
[6] Guo D,Ren S,Lu S,et al.Graphcodebert:Pre-training code representations with data flow[C]. International Conference on Learning Representations,2021:3-7.
[7] 吴芳.基于深度学习的二进制程序漏洞分析与检测方法研究[D]. 北京:北京交通大学,2018.
[8] 顾绵雪,孙鸿宇,韩丹,等.基于深度学习的软件安全漏洞挖掘[J]. 计算机研究与发展,2021,58(10):2140-2162.
[9] Li X,Wang L,Xin Y,et al.Automated software vulnerability detection based on hybrid neural network[J]. Applied Sciences,2021,11(7):3201.
[10] Zou D,Wang S,Xu S,et al.μVulDeePecker:A Deep Learning-Based System for Multiclass Vulnerability Detection[J]. IEEE Transactions on Dependable and Secure Computing,2019,18(5):2224-2236.
[11] Mikolov T,Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[J]. Advances in neural information processing systems,2013,26.
[12] Liu Z,Lin W,Shi Y,et al.A robustly optimized BERT pre-training approach with post-training[C]. China National Conference on Chinese Computational Linguistics.Cham:Springer International Publishing,2021:471-484.
[13] Kool W,Van Hoof H,Welling M.Attention,learn to solve routing problems[J]. arXiv preprint arXiv:1803.08475,2018.
[14] Fan J,Li Y,Wang S,et al.AC/C++ code vulnerability dataset with code changes and CVE summaries[C]. Proceedings of the 17th International Conference on Mining Software Repositories.2020:508-512.
[15] Li Y,Wang S,Nguyen T N.Vulnerability detection with fine-grained interpretations[C]. Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2021:292-303.
[16] Hin D,Kan A,Chen H,et al.LineVD:Statement-level vulnerability detection using graph neural networks[C]. Proceedings of the 19th International Conference on Mining Software Repositories.2022:596-607.
[17] Li Z,Zou D,Xu S,et al.Sysevr:A framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing,2021,19(4):2244-2258.
[18] Zhou Y,Liu S,Siow J,et al.Devign:Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[J]. Advances in neural information processing systems,2019,32.
相似文献/References:
[1]卢 丽,许源平,卢 军,等.基于社会力异常检测改进算法的人群行为模型[J].成都信息工程大学学报,2018,(01):1.[doi:10.16836/j.cnki.jcuit.2018.01.001]
LU Li,XU Yuan-ping,LU Jun,et al.A Crowd Behavior Model based on an ImprovedSocial Force Anomaly Detection Algorithm[J].Journal of Chengdu University of Information Technology,2018,(04):1.[doi:10.16836/j.cnki.jcuit.2018.01.001]
[2]胡 婕,陶宏才.基于深度学习的领域问答系统的设计与实现[J].成都信息工程大学学报,2019,(03):232.[doi:10.16836/j.cnki.jcuit.2019.03.004]
HU Jie,TAO Hongcai.Design and Implementation of Domain Question Answering System based on Deep Learning[J].Journal of Chengdu University of Information Technology,2019,(04):232.[doi:10.16836/j.cnki.jcuit.2019.03.004]
[3]王 强,李孝杰,陈 俊.基于He-Net的卷积神经网络算法的图像分类研究[J].成都信息工程大学学报,2017,(05):503.[doi:10.16836/j.cnki.jcuit.2017.05.007]
WANG Qing,LI Xiao-jie,CHEN Jun.Research on Image Classification based on HE-Net Convolutional Neural Networks[J].Journal of Chengdu University of Information Technology,2017,(04):503.[doi:10.16836/j.cnki.jcuit.2017.05.007]
[4]冉元波,孙 敏,高梦清,等.双偏振天气雷达水凝物识别研究[J].成都信息工程大学学报,2017,(06):590.[doi:10.16836/j.cnki.jcuit.2017.06.003]
RAN Yuan-bo,SUN Min,GAO Meng-qing,et al.Study on Hydrometeor Identification based on Deep Learning[J].Journal of Chengdu University of Information Technology,2017,(04):590.[doi:10.16836/j.cnki.jcuit.2017.06.003]
[5]周 咏,万 垚.基于无人机的监控系统设计[J].成都信息工程大学学报,2021,36(02):159.[doi:10.16836/j.cnki.jcuit.2021.02.006]
ZHOU Yong,WAN Yao.Design of Surveillance System based on UAV[J].Journal of Chengdu University of Information Technology,2021,36(04):159.[doi:10.16836/j.cnki.jcuit.2021.02.006]
[6]谭诗雨,杨 玲,师春香,等.复杂背景下银行卡号识别方法研究[J].成都信息工程大学学报,2021,36(03):280.[doi:10.16836/j.cnki.jcuit.2021.03.007]
TAN Shiyu,YANG Ling,SHI Chunxiang,et al.Bank Card Number Recognition System under the Complex Background based on Deep Learning[J].Journal of Chengdu University of Information Technology,2021,36(04):280.[doi:10.16836/j.cnki.jcuit.2021.03.007]
[7]郭楠馨,林宏刚,张运理,等.基于元学习的僵尸网络检测研究[J].成都信息工程大学学报,2022,37(06):615.[doi:10.16836/j.cnki.jcuit.2022.06.001]
GUO Nanxin,LIN Honggang,ZHANG Yunli,et al.Botnet Detection Method based on Meta-Learning Network[J].Journal of Chengdu University of Information Technology,2022,37(04):615.[doi:10.16836/j.cnki.jcuit.2022.06.001]
[8]李 静,鲜 林,王海江.基于YOLOv3的船只检测算法研究[J].成都信息工程大学学报,2023,38(01):37.[doi:10.16836/j.cnki.jcuit.2023.01.006]
LI Jing,XIAN Lin,WANG Haijiang.Research on Ship Detection Algorithm based on YOLOv3[J].Journal of Chengdu University of Information Technology,2023,38(04):37.[doi:10.16836/j.cnki.jcuit.2023.01.006]
[9]毛 波,杨 昊,周世杰,等.基于CMA-REPS格点预报数据的深度学习风速订正方法[J].成都信息工程大学学报,2023,38(03):264.[doi:10.16836/j.cnki.jcuit.2023.03.003]
MAO Bo,YANG Hao,ZHOU Shijie,et al.A Deep Learning Method for Wind Speed Grid Point Forecasting Data Correction based on CMA-REPS[J].Journal of Chengdu University of Information Technology,2023,38(04):264.[doi:10.16836/j.cnki.jcuit.2023.03.003]
[10]任不凡,黄小燕,吴思东,等.基于语义信息的三维点云全景分割方法研究[J].成都信息工程大学学报,2023,38(05):535.[doi:10.16836/j.cnki.jcuit.2023.05.007]
REN Bufan,HUANG Xiaoyan,WU Sidong,et al.Research on Panoptic Segmentation of 3D Point Clouds based on Semantic Information[J].Journal of Chengdu University of Information Technology,2023,38(04):535.[doi:10.16836/j.cnki.jcuit.2023.05.007]
备注/Memo
收稿日期:2024-01-08
基金项目:国家社会科学基金资助项目(23BSH061); 四川省科技计划资助项目(2023YFG0292、2021ZYD0011)
通信作者:王敏.E-mail:wmcuit@cuit.edu.cn
