一种构建信息检索系统评测集方法.doc

下载文档 降价啦

6
0
约1.89万字
约 37页
2018-06-15 发布于福建
举报
版权申诉
保障服务

一种构建信息检索系统评测集方法.doc

1、本文档共37页，可阅读全部内容。
2、有哪些信誉好的足球投注网站（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

一种构建信息检索系统评测集方法

本科生学位论文题目：一种构建信息检索系统评测集的方法姓名：学号：院系：信息科学技术学院专业：计算机科学导师：教授二零零九年五月摘要随着互联网的飞速发展，信息检索技术在实际生活中的作用越来越重要，在学术界也引起了研究学者的重视。参考国外测试集的构建经验，天网实验室构建了大规模中文网页信息检索测试集CWT，并组织了SEWM中文网页检索评测，希望在国内外各个研究小组的共同参与下建立并完善CWT，一起推动中文检索技术的发展。’s life. Meanwhile, it has become a crucial research topic across different research entities. Following the existing test collection framework established by foreign research conference (especially TREC Test REtrieval Conference), TianWang research team has constructed a large-scale Chinese Web Test collection (CWT), and is organizing SEWM Chinese Web search evaluation on a yearly basis. Test collections is of great importance in the study of Information Retrieval, it encompass corpora of documents, sets of topics and relevance judgment indicating which documents are relevant to which topics. Topics could be retrieved from current web user log, or established by annotators experienced in certain field of study. And accurate estimation of information retrieval evaluation metrics such as Average Precision require large sets of relevance judgment. Building sets large enough for evaluation of real world implementation is at best inefficient, at worst infeasible. In the work, we tried to come up with an algorithm that requires minimal human effort in gaining an appropriate topic set as well as relevance judgment set. We firstly conducted a close study of web search engine user log, especially the distribution on clicks, frequency and length of searching items. Later, with a smaller set of sampling is tested on different ranking algorithm, which could be viewed as different information retrieval system. Using what has been discovered about the AP (Average Precision), MAP (Mean Average Precision) metrics variation, we came to find a method that is better at distinguishing good IR systems from worse ones in a high confidence of evaluation outcome and within an competitively short time period. Keywords: Information Retrieval, Evaluatio