仿真数据论文：仿真数据并行分布式挖掘算法研究.docVIP

下载本文档

7
0
约4.08千字
约 5页
2017-03-23 发布于江苏
举报
版权申诉

仿真数据论文：仿真数据并行分布式挖掘算法研究.doc

1、本文档共5页，可阅读全部内容。
2、有哪些信誉好的足球投注网站（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。
5、该文档为VIP文档，如果想要下载，成为VIP会员后，下载免费。
6、成为VIP后，下载本文档将扣除1次下载权益。下载后，不支持退款、换文档。如有疑问请联系我们。
7、成为VIP后，您将拥有八大权益，权益包括：VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
8、VIP文档为合作方或网友上传，每下载1次，网站将根据用户上传文档的质量评分、类型等，对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档

仿真数据论文：仿真数据并行分布式挖掘算法研究

仿真数据论文：仿真数据并行分布式挖掘算法研究【中文摘要】仿真实验产生了大规模的仿真数据,为了从中提取有用的信息和知识,更好地认识和改进系统,帮助决策人员分析决策,可以用数据挖掘方法来进行仿真数据分析。由于仿真数据具有维度高、规模大的特点,需要采用并行挖掘算法来提高效率。又根据仿真数据分布式存储的特点,避免大规模数据集中所带来的开销以及安全性,需要研究分布式挖掘算法。本文主要的研究工作包括以下内容:根据仿真数据固有特点和分布式存储特点,提出了并行分布式挖掘的需求。根据数据挖掘的基本流程,总结了仿真数据挖掘的一般流程。寻规律和寻优是仿真中的两种重要需求,相应地研究了两种常用的挖掘方法:关联规则和决策树,来挖掘系统中的一些关联规律,以及对仿真系统的优化。对于仿真实验寻规律的需求,将关联规则挖掘方法应用其中。对基本的Apriori算法进行研究改进,采用了向量化的数据结构,减少了输入数据集的存储空间,可以将数据集放入内存,避免对数据库多次扫描带来的I/O开销,提高了算法的效率。用vector容器替代哈希树存储候选项集,减少了算法的空间复杂度。同时为了适应仿真数据大规模的特点,根据CD(Count Distribute)算法的思想对算法进行并行化,并且对算法的拓展性进行了实验设计分析。针对解决仿真实验寻优的需求,采用了决策树挖掘方法。由于仿真数据分布式存储的特点,研究了基于元学习的分布式分类器,以及决策树的两种并行化方法,同步树构建方法和分割树构建方法,实现基于ID3算法的同步树决策树挖掘算法,最后通过仿真测试验证了算法的有效性。【英文摘要】Simulation experiments generate massive simulation data, in order to extract useful information and knowledge, understand and improve the system better, help the deciders make decisions, can use data mining method in simulation data analysis. Because the simulation data has high dimensions and big scale, parallel mining algorithm is needed to improve efficiency. And also these data is stored in distribute place, centralizing the data will be very expansive and may be not safe, distributed mining algorithm should be considered. The context of this paper is given as fallow.Due to the instant and distribute attribute of simulation data, the need of parallel and distributed data mining is proposed. According to the basic process of data mining, we conclude the common process of mining simulation data. Find the rules and optimism is two important needs in simulation, there are two favorite mining method correspondingly, association rules and decision tree. They can find the rules in the system and optimize the simulation system.We use association rule method to find rules in simulation experiment. We make some improvement in Apriori algorithm. First, use the vector data structure, reduce the space of the input dataset, so it can be put into the main memory and avoid the overhead caused by scanning databa