- 1、本文档共37页,可阅读全部内容。
- 2、有哪些信誉好的足球投注网站(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
一种构建信息检索系统评测集方法
本科生学位论文
题目:一种构建信息检索系统评测集的方法
姓 名:
学 号:
院 系:信息科学技术学院
专 业:计算机科学
导 师: 教授
二零零九年五月
摘要
随着互联网的飞速发展,信息检索技术在实际生活中的作用越来越重要,在学术界也引起了研究学者的重视。参考国外测试集的构建经验,天网实验室构建了大规模中文网页信息检索测试集CWT,并组织了SEWM中文网页检索评测,希望在国内外各个研究小组的共同参与下建立并完善CWT,一起推动中文检索技术的发展。’s life. Meanwhile, it has become a crucial research topic across different research entities. Following the existing test collection framework established by foreign research conference (especially TREC Test REtrieval Conference), TianWang research team has constructed a large-scale Chinese Web Test collection (CWT), and is organizing SEWM Chinese Web search evaluation on a yearly basis.
Test collections is of great importance in the study of Information Retrieval, it encompass corpora of documents, sets of topics and relevance judgment indicating which documents are relevant to which topics. Topics could be retrieved from current web user log, or established by annotators experienced in certain field of study. And accurate estimation of information retrieval evaluation metrics such as Average Precision require large sets of relevance judgment. Building sets large enough for evaluation of real world implementation is at best inefficient, at worst infeasible.
In the work, we tried to come up with an algorithm that requires minimal human effort in gaining an appropriate topic set as well as relevance judgment set. We firstly conducted a close study of web search engine user log, especially the distribution on clicks, frequency and length of searching items. Later, with a smaller set of sampling is tested on different ranking algorithm, which could be viewed as different information retrieval system. Using what has been discovered about the AP (Average Precision), MAP (Mean Average Precision) metrics variation, we came to find a method that is better at distinguishing good IR systems from worse ones in a high confidence of evaluation outcome and within an competitively short time period.
Keywords: Information Retrieval, Evaluatio
您可能关注的文档
- 一年级上册道德与法治冬天探索北师大版.ppt
- 一年级上册美术美妙小世界岭南版.ppt
- 一年级上册音乐你名字叫什么? _00003.ppt
- 一年级上册音乐快乐小熊猫 |人音版.ppt
- 一场关于绿演奏模板.ppt
- 一年级上册音乐丰富多彩动画人物人教新课标版.ppt
- 一年级上册音乐你名字叫什么? .ppt
- 一年级上册音乐你名字叫什么? _00004.ppt
- 一年级上册音乐-快乐小笛子 |人音版.ppt
- 一年级上册音乐顽皮杜鹃花城版.ppt
- 2024-2030年中国人力资源服务行业深度分析及发展战略研究咨询报告.docx
- 2024-2030年中国RV减速机行业应用态势与需求规模预测报告.docx
- 2024-2030年中国pvc焊枪行业市场现状分析及竞争格局与投资发展研究报告.docx
- 2024-2030年中国个人理财行业发展分析及发展前景与趋势预测研究报告.docx
- 2024-2030年中国中压塑料薄膜电容器行业市场发展趋势与前景展望战略研究报告.docx
- 2024-2030年中国两栖挖掘机行业市场发展趋势与前景展望战略分析报告.docx
- 2024-2030年中国乳胶饱和纸行业市场发展趋势与前景展望战略分析报告.docx
- 2024-2030年中国二硫化碳行业投资风险预警与未来战略分析研究报告.docx
- 2024-2030年中国二乙烯苯行业盈利态势及前景分析预测研究报告(1).docx
- 2024-2030年中国亚麻胶行业市场运行分析及投资价值评估报告.docx
文档评论(0)