云计算的安全架构与技术 补充材料1.ppt

云计算的安全架构与技术 补充材料1

MapReduce: Simplified Data Processing on Large Clusters Motivation Large-Scale Data Processing Want to use 1000s of CPUs But don’t want hassle of managing things MapReduce provides Automatic parallelization distribution Fault tolerance I/O scheduling Monitoring status updates Map/Reduce Map/Reduce Programming model from Lisp (and other functional languages) Many problems can be phrased this way Easy to distribute across nodes Nice retry/failure semantics Map in Lisp (Scheme) (map f list [list2 list3 …]) (map square ‘(1 2 3 4)) (1 4 9 16) (reduce + ‘(1 4 9 16)) (+ 16 (+ 9 (+ 4 1) ) ) 30 (reduce + (map square (map – l1 l2)))) Map/Reduce ala Google map(key, val) is run on each item in set emits new-key / new-val pairs reduce(key, vals) is run for each unique key emitted by map() emits final output count words in docs Input consists of (url, contents) pairs map(key=url, val=contents): For each word w in contents, emit (w, “1”) reduce(key=word, values=uniq_counts): Sum all “1”s in values list Emit result “(word, sum)” Count, Illustrated map(key=url, val=contents): For each word w in contents, emit (w, “1”) reduce(key=word, values=uniq_counts): Sum all “1”s in values list Emit result “(word, sum)” Grep Input consists of (url+offset, single line) map(key=url+offset, val=line): If contents matches regexp, emit (line, “1”) reduce(key=line, values=uniq_counts): Don’t do anything; just emit line Reverse Web-Link Graph Map For each URL linking to target, … Output target, source pairs Reduce Concatenate list of all source URLs Outputs: target, list (source) pairs Inverted Index Map Reduce Model is Widely Applicable MapReduce Programs In Google Source Tree Implementation Overview Execution How is this distributed? Partition input key/value pairs into chunks, run map() tasks in parallel After all map()s are complete, consolidate all emitted values for each unique emitted key Now partition space of output map keys, and run reduce() in parallel If map() or reduce


