XML重复对象检测系统的设计与实现-计算机软件与理论专业论文.docx

下载文档 降价啦

2
0
约4.54万字
约 53页
2018-09-06 发布于上海
举报
版权申诉
保障服务

XML重复对象检测系统的设计与实现-计算机软件与理论专业论文.docx

1、本文档共53页，可阅读全部内容。
2、有哪些信誉好的足球投注网站（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

XML重复对象检测系统的设计与实现-计算机软件与理论专业论文

I I 摘要随着 Internet 和信息技术的高速发展，XML 文档作为数据存储介质应用范围越来越广泛，XML 数据的重复元素检测问题已经引起了从事数据库和 Internet 应用等研究人员的大量关注。而 XML 数据结构的多样性，给 XML 元素相似性判断带来很大困难。为了有效的清除 XML 数据中的重复元素，研究了 XML 重复元素识别规则，设计和实现了重复 XML 元素检测系统。研究了重复 XML 元素判定标准、相似字符串识别和 XML 元素相似度计算等问题，分析出 XML 重复元素检测的关键是如何有效地处理结构多样性的问题和如何处理父、子元素间的依赖关系，并设计实现了重复 XML 元素检测系统。检测系统主要由文档预处理模块、相似字符串识别模块和元素相似度计算模块组成。在检测系统实现方面，给出了一种自顶向下、多重过滤的检测方法。通过对 XML 数据存储结构的分析，给出了重复 XML 元素对象的定义；通过文档预处理在一定程度上解决了 XML 结构多样性的问题；通过设计多种过滤条件，有效的降低了检测字符串相似度和 XML 元素相似度的计算量；通过自顶向下的遍历解决了 XML 元素父子元素间的依赖关系。设计实现了 Dirty XML Generator（DXG）工具，用来生成实验数据。为了说明检测系统的正确性和过滤条件的有效性，通过 DXG 工具往 XML 数据内引入了结构错误和字符串错误两种类型的脏数据，对每个过滤条件都进行了单独的分析，对检测系统的正确性和效率也进行了分析。最终说明了所有过滤条件都是有效而且高效的，检测系统检测的结果也和预先引入的脏数据一致。关键字：重复元素检测系统, 可扩展标记语言, 相似字符串, 多重过滤, 自顶向下 II II Abstract With the rapid development of the Internet and information technology, the scope of application of XML documents as a data storage medium are more widely, great attentions have been paid to the problem of detecting duplicate XML elements. And the diversity of XML document’s structure has caused great difficulties to the similar detection of the XML elements. To effectively remove duplicate elements in XML documents, recognition rules of duplicate elements had been studied, and a duplicate XML element detection system had been designed and implemented. The criteria of repetitive elements, identifying similar strings and similarity calculation of XML elements had been studied. And concluded that the key problem of detecting duplicate XML elements is how to effectively deal with diversity issues and how to find the complex dependencies between the parent and the sub-elements, and a duplicate XML element detection system had been designed and implemented. The detection system consists of document pre-processing module, the module of identifying similar strings and the module of the similarity calculation of XML elements. In the field of completing the detection system, a top-down, multi-detection filter detection