GUO Jianyang.Research on Semantic Analysis of Multimodal Documents in Air Traffic Control Field based on Large Language Models[J].Journal of Chengdu University of Information Technology,2025,40(06):806-811.[doi:10.16836/j.cnki.jcuit.2025.06.010]
基于大语言模型的空管专业多模态文档语义分析研究
- Title:
- Research on Semantic Analysis of Multimodal Documents in Air Traffic Control Field based on Large Language Models
- 文章编号:
- 2096-1618(2025)06-0806-06
- 关键词:
- OCR; Langchain; RAG; 多模态; transformer
- Keywords:
- OCR; Langchain; RAG; multimodal documents; transformer
- 分类号:
- TP182
- 文献标志码:
- A
- 摘要:
- 民航空管作为技术密集型行业,生产运行过程会产生大量包含文本、图片和表格等格式的专业技术文档,对此类多模态文档的分析是空管业务发展的迫切需求。然而,传统的方法主要基于人工或者OCR等工具实现,人工分析效率低下,基于OCR工具则存在结构化信息缺失、特殊内容处理薄弱、图像信息提取有瓶颈等问题。因此,提出3种基于RAG技术的设计方法,借助多模态大模型实现空管专业技术文档语义解析,通过对比3种方法的实验效果,明确基于RAG技术的识别优化方式和路径,为多模态语义深度分析提供参考依据和实践指导。
- Abstract:
- As a technology-intensive industry, civil aviation air traffic control generates a large number of professional technical documents containing text, images, and tables during operational processes. The analysis of such multimodal documents is an urgent need for the development of air traffic control field. However, traditional methods mainly rely on manual work or OCR tools, with manual analysis being inefficient, and OCR-based approaches facing problems such as missing structured information, weak processing of special content, and bottlenecks in image information extraction. Therefore, this paper proposes three methods based on RAG technology, leveraging multimodal large language models to achieve semantic parsing of professional air traffic control technical documents. By comparing the experimental results of the three methods, it clarifies the optimization methods and pathways for recognition based on RAG technology, providing reference and practical guidance for deep multimodal semantic analysis.
参考文献/References:
[1] Ashish Vaswani Noam Shazeer Niki Parmar et al.Attention Is All You Need[C].31st Conference on Neural Information Processing Systems(NIPS 2017),Long Beach,CA,USA.2017:2-3.
[2] Han L Z,Awes M,Almas B,et al.A Survey of Generative Categories and Techniques in Multimodal Large Language Models[J].arXiv:2506.10016,2025.
[3] Shahul Es,Jithin James,Luis Espinosa-Anke et al.Ragas:Automated Evaluation of Retrieval Augmented Generation[J].arXiv:2309.15217,2023.
[4] 王文广.知识增强大模型[M].北京:电子工业出版社,2025:25.
[5] 李特丽.LangChain入门指南:构建高可复用、可扩展的LLM应用程序[M].北京:电子工业出版社,2024:16.
[6] 程希冀.学会提问驾驭AI:提示词从入门到精通[M].北京:电子工业出版社,2024:26-28.
备注/Memo
收稿日期:2025-06-30
通信作者:郭健洋.E-mail:guo_jianyang@qq.com
