- 1、本文档共24页,可阅读全部内容。
- 2、有哪些信誉好的足球投注网站(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
clique是一个图中两两相邻的一个点集,或是一个完全子图
Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words Dmitry Davidov, Ari Rappoport The Hebrew University ACL 2006 Introduction Discovering word categories, sets of words sharing a significant aspect of their meaning context feature vectors pattern-based discovery Manually prepared pattern set (ex. x and y) requiring POS tagging or partial or full parsing Pattern Candidates high frequency word (HFW) A word appearing more than TH times per million words Ex. and, or, from, to … content word (CW) word appearing less than TC times per a million words Pattern Candidates meta-patterns obey the following constraints at most 4 words exactly two content words no two consecutive CWs Example CHC, CHCH, CHHC, and HCHC from x to y (HCHC), x and y (CHC), x and a y (CHHC) Symmetric Patterns In order to find a usable subset from pattern candidates, we focus on the symmetric patterns Example x belongs to y (asymmetric relationships) X and y (asymmetric relationships) Symmetric Patterns We use single pattern graph G(P) to identifying symmetric patterns there is a directed arc A(x, y) from node x to node y iff the words x and y both appear in an instance of the pattern P as its two CWs x precedes y in P SymG(P), the symmetric subgraph of G(P), containing only the bidirectional arcs and nodes of G(P) Symmetric Patterns We compute three measures on G(P) M1 counts the proportion of words that can appear in both slots of the pattern M2, M3 measures count the proportion of the number of symmetric nodes and edges in G(P) Symmetric Patterns We removed patterns that appear in the corpus less than TP times per million words We remove patterns that are not in the top ZT in any of the three lists We remove patterns that are in the bottom ZB in at least one of the lists Discovery of Categories words that are highly interconnected are good candidates to form a category word relationship graph G merging all of the single-pattern graphs
您可能关注的文档
- B-a组团精装修工程总结.doc
- a必修2:人类面临的主要环境问题(湘教版).doc
- A近三年全国研究生考试思想政治理论试题解1.doc
- B16大体积施工方案.doc
- B.B 霜反馈记录.doc
- B2挡土墙边沟开挖及砌筑.doc
- B6#外脚手架技术交底.doc
- Baqniom日语生活商务口语会话教程13(放松温泉).doc
- BCR-ABL融和.ppt
- BDX系列电除尘器使用说明书.doc
- 英威腾GD-20变频器-说明书.docx
- 海南省东方市民族中学2025届高三下学期联考历史试题含解析.doc
- 浙江省金华市曙光学校2025届高三第五次模拟考试历史试卷含解析.doc
- 2025届湖北省宜昌市部分示范高中教学协作体高考生物四模试卷含解析.doc
- 广东省实验中学2025届高三第二次模拟考试生物试卷含解析.doc
- 江西奉新县2025届高三第二次调研化学试卷含解析.doc
- 2025届青海西宁二十一中高考全国统考预测密卷化学试卷含解析.doc
- 河南省周口市商水县周口中英文学校2025届高考生物倒计时模拟卷含解析.doc
- 2025届江西省赣州市赣县三中高三一诊考试历史试卷含解析.doc
- 乐都县第一中学2025届高考临考冲刺生物试卷含解析.doc
文档评论(0)