- 1、本文档共26页,可阅读全部内容。
- 2、有哪些信誉好的足球投注网站(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
Data Mining using Mahout SIEL@IIIT, Hyderabad(数据挖掘使用Mahout SIEL@IIIT,海德拉巴)
Data Mining using Mahout
Data Mining using Mahout
Team No. 8
Pratibha Rani
Prashant Sethia
Manisha Verma
What is Mahout?
What is Mahout?
Subproject of Apache Lucene
◦Goal: delivering scalable machine learning
algorithm implementations
◦/mahout/
Version 0.1 released on 07 April 2009
includes 10 algorithm libraries
◦Details in published paper:
/people/ang//paper
s/nips06-mapreducemulticore.pdf
Objective
Objective
Implement two Data Mining/Machine
Learning algorithms
◦Convert the algorithm in MapReduce
paradigm
◦Implement using Hadoop
◦Optimize computation
take advantage of MapReduce paradigm
Integrate them in Mahout Library
◦Make it available online.
Implemented Algorithms
Implemented Algorithms
Classification of Multi Class data using
Linear Discriminant Function (LDF)
◦Machine Learning method for classification
◦Computational cost increases as number of
classes increase
SPRINT
◦Decision tree based parallel classifier for Data
Mining
◦ Requires parallelization of computations
Decision Tree Example
Decision Tree Example
Attribute Lists
Attribute Lists
Algorithm
Algorithm
Algorithm (contd.)
Algorithm (contd.)
SPRINT: Introduction
SPRINT: Introduction
Carry out decision tree building process in parallel
◦ Frequent lookup of the central class list produces a lot of
network communication in the parallel case
◦ Solution: Eliminate the class list
Class labels distributed to each attribute list
= Redundant data, but the memory-resident and
network communication bottlenecks are removed
Each node keeps its own set of a
您可能关注的文档
- CompTIA Cloud+ Certification Exam Objectives (前年云+认证考试目标).pdf
- Computation of Rolling Stand Parameters by (滚动站参数的计算).pdf
- Computational Fluid Dynamics SwRI(计算流体动力学SwRI).pdf
- COMPUTATIONAL ANALYSIS AND DESIGN OF (计算分析和设计的).pdf
- Computational Fluid Dynamics (CFD) ESSIE(计算流体动力学(CFD)埃西).pdf
- Computational fluid dynamics hoffman solution (计算流体动力学霍夫曼的解决方案).pdf
- Computational Fluid Dynamics Modeling of (计算流体动力学建模的).pdf
- Computational Fluid Dynamics (CFD) (计算流体动力学(CFD)).pdf
- Computational Fluid Dynamics2 (CFDII)(计算流体Dynamics2(CFDII)).pdf
- Computational Geometry (Master Course ... Yazd(计算几何(硕士课程u2026).pdf
- 2024-2025学年高中生物学必修2 遗传与进化沪科版(2019)教学设计合集.docx
- 2024-2025学年高中地理选修6 环境保护湘教版教学设计合集.docx
- 2024-2025学年高中地理必修 第一册人教版(2019)教学设计合集.docx
- 2024-2025学年高中地理选择性必修2 区域发展中图版(2019)教学设计合集.docx
- 2024-2025学年高中地理选择性必修3中图中华地图版教学设计合集.docx
- 2024-2025学年小学劳动一年级下册湘人版《劳动实践指导手册》教学设计合集.docx
- 2024-2025学年小学信息技术(信息科技)六年级下册电子工业版(内蒙古)教学设计合集.docx
- 2024-2025学年初中英语八年级下册上海新世纪版教学设计合集.docx
- 2024-2025学年高中化学选修3 物质结构与性质鲁科版教学设计合集.docx
- 2024-2025学年中职美术公共艺术(美术篇)人教版(2013)教学设计合集.docx
文档评论(0)