LI Nan,TAO Hong-cai.A Novel News Summary Algorithm Combining BM25 and Text Features[J].Journal of Chengdu University of Information Technology,2018,(02):113-118.[doi:10.16836/j.cnki.jcuit.2018.02.002]
一种新的融合BM25与文本特征的新闻摘要算法
- Title:
- A Novel News Summary Algorithm Combining BM25 and Text Features
- 文章编号:
- 2096-1618(2018)02-0113-06
- Keywords:
- BM25; TextRank; word frequency; graph sort; ROUGE
- 分类号:
- TP391
- 文献标志码:
- A
- 摘要:
- 提出一种融合BM25与文本特征的新闻摘要算法。首先使用BM25算法计算TextRank算法中的句子相似度,其次选择词频和句子位置作为文本特征,最后将文本特征的评分与TextRank的评分相加作为文本中句子的评分,对所有的句子按照评分降序排列,选择评分最高的几个句子作为摘要。使用ROUGE工具在NLPCC2015数据集上进行测试,结果表明该方法有较好的效果。
- Abstract:
- This paper presents a news summary algorithm that combines BM25 and text features. Firstly, we use the BM25 algorithm to calculate the sentence similarity in the TextRank algorithm, then select the word frequency and sentence position as the text features, and take the text feature score and the TextRank score as the final score of the sentence in the text. Finally, we sort all the sentences in descending order according to the final score, and select the sentences with the highest scores as the news summary. The test results on the dataset of NLPCC2015 using ROUGE tools show that this method has a better performance.
参考文献/References:
[1] Luhn HP.The Automatic Creation of Literature Abstracts[J].IBM Journal of Research Development,1958,2(2):159-165.
[2] Baxendale P.Machine-made Index for Technical Literature-an Experiment[J].IBM Journal of Research Development,1958,2(4):354-361.
[3] Kupiec J,Pedersen J,Chen F.A Trainable Document Summarizer[C].ACM SIGIR.New York,USA,1995:68-73.
[4] Mihalcea,Rada,Tarau,et al.TextRank:Bringing Order into Texts[J].Unt Scholarly Works,2004:404-411.
[5] PAGE L.The PageRank Citation Ranking:Bringing Order to the Web,Online manuscript [J].Stanford Digital Libraries Working Paper,1998,9(1):1-14.
[6] Barrios F,López F,Argerich L,et al.Variations of the Similarity Function of TextRank for Automated Summarization[C].Argentine Symposium on Artificial Intelligence(ASAI)2015-44 JAIIO,2015:65-72.
[7] Document Understanding Conference:Duc 2002 guidelines[EB/OL].http://www-nlpir.nist.gov/projects/duc/guidelines/2002.html,2002.
[8] Lin C Y.ROUGE:Recall-oriented understudy for gisting evaluation[J].Text summarization branches out:Proceedings of the ACL-04 workshop,2004,8:74-81.
[9] Liu M,Wang L,Nie L.Weibo-Oriented Chinese News Summarization via Multi-feature Combination[C].Natural Language Processing and Chinese Computing-4th CCF Conference,2015:581-589.
[10] 王子璇,乐小虬,何远标.基于WMD语义相似度的TextRank改进算法识别论文核心主题句研究[J].现代图书情报技术,2017,1(4):1-8.
[11] Kusner M J,Sun Y,Kolkin N I,et al.From word embeddings to document distances[C].International Conference on International Conference on Machine Learning.JMLR.org,2015:957-966.
[12] Wan X,Zhang J,Wen S,et al.Overview of the NLPCC 2015 Shared Task:Weibo-Oriented Chinese News Summarization[M]. Natural Language Processing and Chinese Computing.Springer International Publishing,2015.
[13] The 4th CCF Conference on Natural Language Processing & Chinese Computing[EB/OL].http://tcci.ccf.org.cn/conference/2015/pages/page05_evadata.html,2015.
[14] 张超,陈利,李琼.一种PST_LDA中文文本相似度计算方法[J].计算机应用研究,2016,33(2):375-377.
[15] 孙师尧,妙全兴.基于改进SVM和HMM的文本信息抽取算法[J].计算机应用与软件,2015,32(11):281-284.
相似文献/References:
[1]李 敏,陶宏才.基于关键词抽取的网络博客自动文摘算法的研究[J].成都信息工程大学学报,2020,35(02):158.[doi:10.16836/j.cnki.jcuit.2020.02.006]
LI Min,TAO Hongcai.Research on Automatic Digest Algorithm of Web Blog based on Keyword Extraction[J].Journal of Chengdu University of Information Technology,2020,35(02):158.[doi:10.16836/j.cnki.jcuit.2020.02.006]
备注/Memo
收稿日期:2018-01-13基金项目:国家自然科学基金资助项目(61505168)