[工学]2 frequent pattern -c.ppt

  1. 1、本文档共39页,可阅读全部内容。
  2. 2、有哪些信誉好的足球投注网站(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
  3. 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  4. 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
[工学]2 frequent pattern -c

关联规则 Write-based、DFS算法 VIPER Implement DFS with bitmap compression of data in vertical format FP-Growth算法 Implement DFS using a tree structure in horizontal format FP-Growth based sub-graph mining Comparison of Data Layouts Vertical format for write-based FPM Vertical format Vertical format for write-based FPM 关联规则 Write-based、DFS算法 VIPER Implement DFS with bitmap compression of data in vertical format FP-Growth算法 Implement DFS using a tree structure in horizontal format FP-Growth based sub-graph mining FP-Growth算法 第一次扫描数据库: 类似于Apriori算法,找出频繁的1-itemset和他们的计 数值, 将频繁项目按频度降序排列 2. 第二次扫描数据库: 构造fp-tree (频度从大到小) 挖掘该树(频度从小到大) ? 注意:数据库大时,fp-tree可能在内存中装不下,需要 采取partition方法。 Construct FP-tree from a Transaction Database Find Patterns Having P From P-conditional Database Starting at the frequent item header table in the FP-tree Traverse the FP-tree by following the link of each frequent item p Accumulate all of transformed prefix paths of item p to form p’s conditional pattern base From Conditional Pattern-bases to Conditional FP-trees For each pattern-base Accumulate the count for each item in the base Construct the FP-tree for the frequent items of the pattern base Benefits of the FP-tree Structure Completeness Preserve complete information for frequent pattern mining Never break a long pattern of any transaction Compactness Reduce irrelevant info—infrequent items are gone Items in frequency descending order: the more frequently occurring, the more likely to be shared Never be larger than the original database (not count node-links and the count field) For Connect-4 DB, compression ratio could be over 100 Partition Patterns and Databases Frequent patterns can be partitioned into subsets according to f-list F-list=f-c-a-b-m-p Patterns containing p Patterns having m but no p … Patterns having c but no a nor b, m, p Pattern f Completeness and non-redundency FP-Growth vs. Apriori: Scalability With the Support Threshold VIPER vs. FP-Growth: Comparision VIPER vs. F

文档评论(0)

qiwqpu54 + 关注
实名认证
内容提供者

该用户很懒,什么也没介绍

1亿VIP精品文档

相关文档