GAO Zuoyuan,TAO Hongcai.Research on Multi-task Jointing Model for Task Chat Robot[J].Journal of Chengdu University of Information Technology,2023,38(03):251-257.[doi:10.16836/j.cnki.jcuit.2023.03.001]
面向任务型对话机器人的多任务联合模型研究
- Title:
- Research on Multi-task Jointing Model for Task Chat Robot
- 文章编号:
- 2096-1618(2023)03-0251-07
- 关键词:
- RoBERTa-WWM模型; 多任务联合学习; Theseus压缩; Focal loss
- 分类号:
- TP391.12
- 文献标志码:
- A
- 摘要:
- 在任务型对话机器人的搭建过程中,一般需要执行多个自然语言处理的子任务。目前传统的训练方式是将每个子任务独立训练后再进行整合,这样忽视了不同子任务之间的关联性,限制了模型的预测能力。现提出一种Joint-RoBERTa-WWM-of-Theseus压缩联合模型,一方面通过多任务联合学习训练的方式对意图识别、行业识别和语义槽填充3个子任务进行联合训练,并在多分类的子任务中引入Focal loss机制来解决数据分布不平衡的问题; 另一方面,模型通过Theseus方法进行压缩,在略微损失精度的前提下,大幅提高模型预测速度,提高模型在生产环境中的实时性与实用性。
- Abstract:
- In the process of building a task-oriented chatbot, it is generally necessary to execute several subtasks of Natural Language Processing. And the traditional training method is to integrate each subtask after training independently, which will ignore the relevance between different subtasks and limit the predictive power of the model. This paper proposes a compressed jointed model, i.e., Joint-RoBERTa-WWM-of-Theseus. On the one hand, intention classification, domain classification and semantic slot filling are jointly trained through multi-task joint learning and training. And the focal loss mechanism is introduced to the multi-class classification subtask to solve the problem of data distribution imbalance. On the other hand, the model is compressed by means of Theseus compression method, which greatly improves the prediction speed of the model and improves the applicability and the real-time in the production environment with a slight loss of accuracy.
参考文献/References:
[1] 于丹,闫晓宇,王艳秋,等.任务型对话机器人的设计及其应用[J].软件工程,2021,24(2):55-59.
[2] Devlin J,Chang M W,Lee K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv preprint arXiv,2018,1810:4805.
[3] 李法来,金震,熊婷,等.基于中文Bert模型智能机器人的实现方法和系统[P].中国:CN113553 405A,2021-10-26.
[4] Karl Weiss,Taghi M Khoshgoftaar,DingDing Wang.A survey of transfer learning[J].Journal of Big Data,2016,3(1):1-40.
[5] Yinhan Liu,Myle Ott,Naman Goyal,et al.Veselin Stoyanov.RoBERTa:A Robustly Optimized BERT Pretraining Approach[J].CoRR,2019.
[6] Xuezhe Ma,Eduard H.Hovy.End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF[J].CoRR,2016.
[7] 柏兵,侯霞,石松.基于CRF和BI-LSTM的命名实体识别方法[J].北京信息科技大学学报(自然科学版),2018,33(6):27-33.
[8] 赵京胜,宋梦雪,高祥.自然语言处理发展及应用综述[J].信息技术与信息化,2019(7):142-145.
[9] Qian Chen,Zhu Zhuo,Wen Wang.BERT for Joint Intent Classification and Slot Filling[J].CoRR,2019.
[10] Zhenzhong Lan,Mingda Chen,Sebastian Goodman,et al.ALBERT:A Lite BERT for Self-supervised Learning of Language Representations[J].CoRR,2019.
[11] Xu C,Zhou W,Ge T,et al.BERT-of-Theseus:Compressing BERT by Progressive Module Replacing[J].arXiv preprint arXiv,2020,2002:2925.
[12] Turing A M.Computing machinery and intelligence[J].Mind,1950,59(236):433-460.
[13] 俞凯,陈露,陈博,等.任务型人机对话系统中的认知技术——概念、进展及其未来[J].计算机学报,2015,38(12):2333-2348.
[14] 陈龙,孙泽健.面向任务的对话系统现状研究[J].电子技术与软件工程,2017(23):172-173.
[15] 天猫精灵鲍娟:天猫精灵用AI连接家庭全场景智慧营销[J].国际品牌观察,2021(20):47-48.
[16] Aron J.How innovative is Apple’s new voice assistant,Siri?[J].New Scientist,2011,212(2836):24.
[17] Hoy Matthew B.Alexa,Siri,Cortana,and More:An Introduction to Voice Assistants[J].Medical reference services quarterly,2018,37(1):81-88.
[18] Yiming Cui,Wanxiang Che,Ting Liu,et al.Pre-Training with Whole Word Masking for Chinese BERT[J].CoRR,2019.
[19] Schmidhuber J.Deep Learning in Neural Networks:An Overview[J].Neural Networks,2015,6l:85-117.
[20] Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780.
[21] Toshiaki Fukada,Mike Schuster,Yoshinori Sagisaka.Phoneme boundary estimation using bidirectional recurrent neural networks and its applications[J].Systems and Computers in Japan,1999,30(4):20-30.
[22] Lafferty J,Mccallum A,Pereira F.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C].International Conference on Machine Learning. San Francisco,USA,2001:282-289.
[23] Lin Tsung-Yi,Goyal Priya,Girshick Ross,et al.Focal Loss for Dense Object Detection[C].Proceedings of the IEEE international conference on computer vision.2017:2980-2988.
[24] Geoffrey E Hinton,Oriol Vinyals,Jeffrey Dean.Distilling the Knowledge in a Neural Network[J].CoRR,2015.
[25] Nakkiran P,Kaplun G,Bansal Y,et al.2020 Deep double descent:where bigger models and more data hurt Int.Conf.Learning Representations[J].Journal of Statistical Mechanics:Theory and Experiment,2021(12):124003.
备注/Memo
收稿日期:2023-02-20
基金项目:国家自然科学基金资助项目(61806170)