BAI Kaiyi,SHENG Zhiwei,HUANG Yuanyuan.High Noise Traffic Classification based on Penalty Regression[J].Journal of Chengdu University of Information Technology,2025,40(02):125-131.[doi:10.16836/j.cnki.jcuit.2025.02.001]
基于惩罚回归的高噪声流量分类
- Title:
- High Noise Traffic Classification based on Penalty Regression
- 文章编号:
- 2096-1618(2025)02-0125-07
- 分类号:
- TP183
- 文献标志码:
- A
- 摘要:
- 针对网络流量数据容易受到干扰的现实情况,引入带噪声标签学习的思想,并人为添加噪声以模糊化特征。先建立特征和标签之间的线性关系,然后用mean-shift参数识别噪声数据。通过人工添加对称噪声和非对称噪声模拟现实情况下的各种干扰信息。由此提出一个基于L2正则的高噪声流量分类模型(PR-2),通过将流量转换为图像并应用L2正则化方法来处理带噪声的标签,以提高高噪声流量下分类模型的性能。在USTC-TF2016数据集上验证了本方法的有效性,并与LSTM、BiTCN、BoAu、CL、INCV、FINE方法进行对比。实验结果表明,PR-2方法在对称噪声和非对称噪声的噪声比为0.8的情况下仍能取得95.16%和86.15%的准确率,证明其在处理高噪声数据方面的有效性和可用性。
- Abstract:
- This paper aims to address the data quality issues faced in the field of high noise traffic classification. In response to the reality that network traffic data is prone to interference, the idea of noisy label learning(LNL)is introduced,and noise is artificially added to blur features. Firstly, establish a linear relationship between features and labels, and then use non-zero mean-shift parameters to identify noisy data. Simulate various interference information in real situations by manually adding symmetric and asymmetric noise. Therefore, this paper proposes a high-noise traffic classification model based on L2 regularization(PR-2), which converts traffic into images and applies the L2 regularization method to process noisy labels to improve the performance of the classification model under high-noise traffic. The effectiveness of this method was validated on the USTC-TF2016 dataset and compared with LSTM, BiTCN, BoAu, CL, INCV, and FINE methods. The experimental results show that the PR-2 method can still achieve 95.16% and 86.15% accuracy even when the proportion of symmetric and asymmetric noise is 80%, demonstrating its effectiveness and usability in processing high-noise data.
参考文献/References:
[1] Azab A,Khasawneh M,Alrabaee S,et al.Network traffic classification:Techniques,datasets,and challenges[J].Digital Communications and Networks,2024,10(3):676-692.
[2] Guerra J L,Catania C,Veas E.Datasets are not enough:Challenges in labeling network traffic[J].Computers & Security,2022,120:102810.
[3] Song H,Kim M,Park D,et al.Learning from noisy labels with deep neural networks:A survey[J].IEEE Transactions on Neural Networks and Learning Systems,2023,34(11):8135-8153.
[4] Nigam N,Dutta T,Gupta H P.Impact of noisy labels in learning techniques:a survey[C].Advances in Data and Information Sciences:Proceedings of ICDIS 2019.Springer Singapore,2020:403-411.
[5] Hwang R H,Peng M C,Nguyen V L,et al.An LSTM-based deep learning approach for classifying malicious traffic at the packet level[J].Applied Sciences,2019,9(16):3414.
[6] Chen J,Lv T,Cai S,et al.A novel detection model for abnormal network traffic based on bidirectional temporal convolutional network[J].Information and Software Technology,2023,157:107166.
[7] Yuan Q,Liu C,Yu W,et al.BoAu:Malicious traffic detection with noise labels based on boundary augmentation[J].Computers & Security,2023,131:103300.
[8] Northcutt C,Jiang L,Chuang I.Confident learning:Estimating uncertainty in dataset labels[J].Journal of Artificial Intelligence Research,2021,70:1373-1411.
[9] Chen P,Liao B B,Chen G,et al.Understanding and utilizing deep neural networks trained with noisy labels[C].International Conference on Machine Learning.PMLR,2019:1062-1070.
[10] Kim T,Ko J,Choi J H,et al.Fine samples for learning with noisy labels[J].Advances in Neural Information Processing Systems,2021,34:24137-24149.
[11] Anderson B,McGrew D.Machine learning for encrypted malware traffic classification:accounting for noisy labels and non-stationarity[C].Proceedings of the 23rd ACM SIGKDD International Conference on knowledge discovery and data mining.2017:1723-1732.
[12] Al-Gethami K M,Al-Akhras M T,Alawairdhi M.Empirical evaluation of noise influence on supervised machine learning algorithms using intrusion detection datasets[J].Security and Communication Networks,2021,2021(1):8836057.
[13] Yuan Q,Zhu Y,Xiong G,et al.ULDC:Unsupervised Learning-Based Data Cleaning for Malicious Traffic With High Noise[J].The Computer Journal,2024,67(3):976-987.
[14] Zhang C,Bengio S,Hardt M,et al.Understanding deep learning(still)requires rethinking generalization[J].Communications of the ACM,2021,64(3):107-115.
[15] Fallah S,Bidgoly A J.Android malware detection using network traffic based on sequential deep learning models[J].Software:Practice and Experience,2022,52(9):1987-2004.
[16] Han B,Yao J,Niu G,et al.Masking:A new perspective of noisy supervision[EB/OL].http://arxiv.org/abs/1805.08193,2018-05-21/2023-06-01.
[17] Lyu Y,Tsang I W.Curriculum loss:Robust learning and generalization against label corruption[J].arXiv preprint arXiv:1905.10045,2019:1-2.
[18] Wang Y,Sun X,Fu Y.Scalable penalized regression for noise detection in learning with noisy labels[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:346-355.
[19] Weisberg S.Applied linear regression[M].John Wiley & Sons,2005:1-3.
[20] Huber P J.Robust statistics[M].John Wiley & Sons,2004:2-4.
[21] Wang W,Zhu M,Zeng X,et al.Malware traffic classification using convolutional neural network for representation learning[C].2017 International conference on information networking(ICOIN).IEEE,2017:712-717.
备注/Memo
收稿日期:2023-09-20
基金项目:国家重点研发计划资助项目(2022YFB3103103); 四川省重点研发计划资助项目(2022YFS0571); 四川网络文化研究中心资助项目(WLWH22-18); 四川省自然科学基金资助项目(2022NSFSC0557)
通信作者:盛志伟.E-mail:7782988@qq.com