XIANG Chunmei,CHEN Chao.Improved Eclat Algorithm based on MapReduce[J].Journal of Chengdu University of Information Technology,2019,(04):369-374.[doi:10.16836/j.cnki.jcuit.2019.04.008]
基于MapReduce的改进Eclat算法
- Title:
- Improved Eclat Algorithm based on MapReduce
- 文章编号:
- 2096-1618(2019)04-0369-06
- 关键词:
- 数据挖掘; 关联规则; 频繁项集; MapReduce模型; Eclat算法
- 分类号:
- TP301.6
- 文献标志码:
- A
- 摘要:
- 关联规则挖掘一直都是数据挖掘的重要任务,然而随着大数据时代的到来,数据规模呈 指数形式增长,传统的串行挖掘算法已经面临着内存和计算资源不足等问题。针对上述问题, 提出了一种基于MapRedcue并行编程模型的改进Eclat算法——IMREclat算法。IMREclat算法 使用2个MapReduce任务,主要分为3个阶段:首先,平均划分事务数据库,并行挖掘频繁2项集。 然后,将频繁2项集转化为垂直数据格式并利用二进制存储事务列表,按照等价类和其权重值 分组。最后,将分组后的数据作为输入,通过利用预剪枝性质改进后的Eclat算法并行挖掘所 有的频繁项集。实验表明,IMREclat算法在运行时间上优于现有的MREclat算法,并有良好的 扩展性能。
- Abstract:
- The mining of association rules has always been an important task of data mining. However, with the advent of the era of big data, the data scale has grown exponentially. The traditional serial mining algorithms have faced problems such as the insufficient of memory and computing resources. Regarding the issue above, the IMREclat algorithm is proposed, which is an improved Eclat algorithm based on the MapReduce parallel programming model. The IMREclat algorithm uses two MapReduce tasks, which are mainly divided into three phases: Firstly, the transaction database is divided equally, and the frequent 2- itemsets are drilled in parallel. Secondly, the frequent 2-itemsets are converted into a vertical data format, and the binary storage transaction list is used to group by the equivalence class and its weight value. Finally, the grouped data is used as input, and all frequent item sets are mined in parallel by using the improved Eclat algorithm with pre-pruning properties. The experiments show that the IMREclat algorithm outperforms the existing MREclat algorithm in running time and has good expansion performance.
参考文献/References:
[1] Agrawal R,ImielIński T,Swami A.Mining association rules between sets of
items in large databases[C].proceedings of the Acmsigmod record,F,1993:207-216.
[2] Agrawal R,Srikant R.Fast algorithms for mining association rules
[C].proceedings of the Proc 20th intconf very large data bases,VLDB,F,1994:481-
496.
[3] Han J,Pei J,Yin Y.Mining frequent patterns without candidate
generation[C].proceedings of the ACM sigmod record,F,2000:1-12.
[4] 何海.基于Spark的关联规则算法研究与实现[D].北京:北京邮电大学,2017:1-59.
[5] Zaki M J,Parthasarathy S,Ogihara M,et al.New algorithms for fast
discovery of association rules[C].proceedings of the KDD,F,1997:1-25.
[6] 周国军,龚榆桐.基于MapReduce和矩阵的频繁项集挖掘算法[J].微电子学与计算机
,2016,33(5):119-123.
[7] Feng Xingjie,Pan Xuan.Parallel Eclat algorithm based on Spark
[J].Journal of Computer Applications,2019,1(1):21-30.
[8] 李伟卫,赵航,张阳,等.基于MapReduce的海量数据挖掘技术研究[J].计算机工程与
应用,2013,49(20):112-117.
[9] Moens S,Aksehirli E,Goethals B.Frequent Itemset Mining for Big Data
[C].proceedings of the IEEE International Conference on Big Data,F,2013.
[10] Zhang Z,Ji G,Tang M.Mreclat: an algorithm for parallel mining frequent
itemsets[C].proceedings of the Advanced Cloud and Big Data(CBD),2013
International Conference on,F,2013:117-180.
[11] Keerthi K,Sarltha S J.ECLAT:Frequent itemset using MapReduce
[C].proceedings of the 2017 International Conference on
Energy,Communication,Data Analytics and Soft Computing(ICECDS),2017.
[12] 徐卫,李晓粉,刘端阳.基于命题逻辑的关联规则挖掘算法L-Eclat[J].计算机科学
,2017,44(12):211-215.
[13] 崔馨月,孙静宇.改进的Eclat算法研究与应用[J].计算机工程与设计,2018,39
(4):1059-1063.
[14] Dean J,Ghemawat S.MapReduce:simplified data processing on large
clusters[M].ACM,2008:129-135.
[15] Farzanyar Z, Cercone N.Efficient mining of frequent itemsets in social
network data based on MapReduce framework[C].proceedings of the Ieee/acm
International Conference on Advances in Social Networks Analysis and
Mining,F,2013:1183-1188.
[16] Zaki J.Scalable algorithms for association mining[J].IEEE transactions
on knowledge and data engineering,2000,12(3): 372-390.
相似文献/References:
[1]李宝林,周 坤,李仕伟.一种基于M-Bisearch的最大频繁项集挖掘算法研究[J].成都信息工程大学学报,2016,(05):463.
LI Bao-lin,ZHOU Kun,LI Shi-wei.Research on Mining Algorithm of Maximal Frequent Itemsets based on M-blsearch[J].Journal of Chengdu University of Information Technology,2016,(04):463.
[2]杨 頔,文成玉.结合关联规则的情感分析模型研究[J].成都信息工程大学学报,2019,(05):501.[doi:10.16836/j.cnki.jcuit.2019.05.011]
YANG Di,WEN Chengyu.Research on Emotional Analysis Model based on Association Rules[J].Journal of Chengdu University of Information Technology,2019,(04):501.[doi:10.16836/j.cnki.jcuit.2019.05.011]
备注/Memo
收稿日期:2019-01-03 基金项目:四川省科技计划资助项目(18ZDYF3278)