HUANG Guan-ying,ZHENG Jiao-ling.Wikipedia Entries Editing Micro-process Mining based onVariable Length Hidden Markov Model[J].Journal of Chengdu University of Information Technology,2018,(01):34-38.[doi:10.16836/j.cnki.jcuit.2018.01.007]
基于变长隐马尔科夫模型的维基词条编辑微过程挖掘
- Title:
- Wikipedia Entries Editing Micro-process Mining based onVariable Length Hidden Markov Model
- 文章编号:
- 2096-1618(2018)01-0034-05
- Keywords:
- intelligent information processing; data mining; wikipedia entries; tensor factorization; hidden markov model
- 分类号:
- TP311.13
- 文献标志码:
- A
- 摘要:
- 建立一种基于变长隐马尔科夫模型的维基词条编辑微过程挖掘方法。由于传统的EM算法需要指定隐藏状态的数目,而隐状态数目通常需要通过对实际数据的大量人工观察得到,这就使隐状态数目的设置具有较大的主观性。新方法首先基于张量分解来挖掘维基词条编辑微过程的隐状态数目,通过实际的数据分析结果发现词条编辑微过程可以分成保守和激进两种隐藏状态,并利用提取的特征及具有变长隐状态的Baum-Welch算法来训练隐马尔科夫模型。利用真实词条操作历史数据集进行测试,实验结果表明基于变长隐马尔科夫模型的维基词条编辑微过程挖掘方法能够较好拟合编辑微过程,得到较好的隐马尔科夫模型推理精度。
- Abstract:
- A new method of editing microprocess of Wikipedia entries based on variablelength Hidden Markov Model was proposed. Because the traditional Expectation Maximization Algorithm needs to preset the number of hidden states, and the number of hidden states usually requires a large number of manual observations, which makes the number of hidden states set have a greater subjectivity. In this paper, we mined the number of hidden states of the micro-process by using the tensor factorization firstly. Through the actual data analysis, it is found that the editing process can be divided into two hidden states: conservative and radical. Then the extracted features and Baum-Welch algorithm with variable-length states are used to train Hidden Markov Model. The experiment results on real Wikipedia entry data show that the micro-process mining method based on the variablelength Hidden Markov Model can fit the editing micro-process well and obtain the better accuracy of the Hidden Markov Model.
参考文献/References:
[1] WANG Wei-jun,SUN Jing.The Summarization of Research and Application of Web2.0[J].Information Science,2007,12.
[2] 郑皎凌,舒红平,许源平,等.基于社群联盟的冲突消解原则求解图着色问题[J].电子科技大学学报,2016,45(1):2-16.
[3] Don Tapscott,Anthony D Williams.Wikinomics:How Mass Collaboration Changes Everything[J].Portfolio,2006.
[4] 郑皎凌,唐常杰,姜玥,等.基于伪属性语义匹配的Deep web信息抽取[J]. 四川大学学报(工程科学版),2009,41(2):173-178.
[5] 郑皎凌,王鹏.Web站点核心逻辑结构挖掘[J].计算机工程,2010,36(21):57-58,61.
[6] 张朝龙,许源平,郑皎凌.基于协同过滤和文本相似性的Web文本情感极性分类算法[J].成都信息工程学院学报,2015,30(4):355-360.
[7] Zha Y,Zhou T,Zhou C.Unfolding large-scale online collaborative human dynamics[J].Proceedings of the National Academy of Sciences of the United States of America,2016,113(51):14627.
[8] 赵飞,刘金虎,查一龙,等.在线协同写作的人类动力学分析[J].物理学报,2011,60(11).
[9] RABINER L R,JUANG B H.An introduction to hidden Markov models[J].IEEE ASSP Magazine,1986,3(1):4-16.
[10] ZHONG A M, JIA C F.Study on the application of hidden Markov models to computer intrusion detection[A].Proceedings of the 5th World Congress on Intelligent Control and Automation[C].Hangzhou,2004:4352-4356.
[11] 黄颖,殷瑞祥,颜刚华,等.基于GMM的与文本无关的变阈值说话人确认[J].成都信息工程学院学报,2004,(4):541-544.
[12] B W Bader,T G Kolda. Matlab tensor toolbox version2.2. Albuquerque,NM[M].USA: Sandia National Laboratories,2007.
[13] Evangelos Papalexakis,Konstantinos Pelechrinis,Christos Faloutsos.Spotting Misbehaviors in Location-based Social Networks using Tensors[J].Companion Publication of the International Conference on World Wide Web Companion,2014:551-552.
[14] Na Li,Stefan Kindermann,Carmeliza Navasca.Some convergence results on the Regularized Alternating Least-Squares method for tensor decomposition[J].Linear Algebra and Its Applications,2013(2).
[15] Kolda T G,Bader B W.Tensor Decompositions and Applications[J].SIAM Review,2009,51(3):455-500.
相似文献/References:
[1]赵锦阳,卢会国,蒋娟萍,等.基于改进决策树的故障诊断方法研究[J].成都信息工程大学学报,2018,(06):624.[doi:10.16836/j.cnki.jcuit.2018.06.005]
ZHAO Jin-yang,LU Hui-guo,JIANG Juan-ping,et al.Research on Fault Diagnosis Method based on Improved Decision Tree[J].Journal of Chengdu University of Information Technology,2018,(01):624.[doi:10.16836/j.cnki.jcuit.2018.06.005]
[2]李宝林,周 坤,李仕伟.一种基于M-Bisearch的最大频繁项集挖掘算法研究[J].成都信息工程大学学报,2016,(05):463.
LI Bao-lin,ZHOU Kun,LI Shi-wei.Research on Mining Algorithm of Maximal Frequent Itemsets based on M-blsearch[J].Journal of Chengdu University of Information Technology,2016,(01):463.
[3]吴东华,常 征,何 嘉.基于用户行为序列模式的性别分析与预测[J].成都信息工程大学学报,2016,(增刊1):7.
[4]杨 頔,文成玉.结合关联规则的情感分析模型研究[J].成都信息工程大学学报,2019,(05):501.[doi:10.16836/j.cnki.jcuit.2019.05.011]
YANG Di,WEN Chengyu.Research on Emotional Analysis Model based on Association Rules[J].Journal of Chengdu University of Information Technology,2019,(01):501.[doi:10.16836/j.cnki.jcuit.2019.05.011]
备注/Memo
收稿日期:2017-10-15基金项目:国家自然科学基金青年基金资助项目(61202250)