- 1、本文档共38页,可阅读全部内容。
- 2、有哪些信誉好的足球投注网站(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
BatchBatchBatch
“Batch, Batch, Batch:”
What Does It Really Mean?
Matthias Wloka
What Is a Batch?
? Every DrawIndexedPrimitive() is a batch
– Submits n number of triangles to GPU
– Same render state applies to all tris in batch
– SetState calls prior to Draw are part of batch
? Assuming efficient use of API
– No Draw*PrimitiveUP()
– DrawPrimitive() permissible if warranted
– No unnecessary state changes
? Changing state means at least two batches
Why Are Small Batches Bad?
? Games would rather draw 1M
objects/batches of 10 tris each
– versus 10 objects/batches of 1M tris each
? Lots of guesses
– Changing state inefficient on GPUs (WRONG)
– GPU triangle start-up costs (WRONG)
– OS kernel transitions (WRONG)
? Future GPUs will make it better!? Really?
? Test app does…
– Degenerate triangles (no fill cost)
– 100% PostTnL cache vertices (no xform cost)
– Static data (minimal AGP overhead)
– ~100k tris/frame, i.e., floor(100k/x) draws
– Toggles state between draw calls:
(VBs, w/v/p matrix, tex-stage and alpha states)
? Timed across 1000 frames
? Theoretical maximum triangle rates!
Let’s Write Code!
Testing Small Batch Performance
Measured Batch-Size Performance
0
10
20
30
40
50
60
70
80
90
100
10 30 50 70 90 11
0
13
0
15
0
17
0
19
0
30
0
50
0
70
0
90
0
11
00
13
00
15
00
triangles/batch
m
ill
io
n
t
ri
an
g
le
s/
s
Athlon XP 2.7+; NVIDIA GeForce FX 5800
Athlon XP 2.7+; NVIDIA GeForce4 Ti 4600
Athlon XP 2.7+; NVIDIA GeForce3 Ti 500
Athlon XP 2.7+; NVIDIA GeForce4 MX 440
Athlon XP 2.7+; NVIDIA GeForce2 MX/MX 400
Axis scale change
Optimization Opportunities
0
10
20
30
40
50
60
70
80
90
100
10 30 50 70 90 11
0
13
0
15
0
17
0
19
0
30
0
50
0
70
0
90
0
11
00
13
00
15
00
triangles/batch
m
ill
io
n
t
ri
an
g
le
s/
s
Athlon XP 2.7+; NVIDIA GeForce FX 5800
Athlon XP 2.7+; NVIDIA GeForce4 Ti 4600
Athlon XP 2.7+; NVIDIA GeForce3 Ti 500
Athlon XP 2.7+; NVIDIA GeForce4 MX 440
Athlon XP 2.7+; NVIDIA GeForce2 MX/MX 400
40x
100x
Axis scale change
Measured Batch-Size Performance
0
10
20
30
40
50
60
70
您可能关注的文档
- Absence of Long-Range Coherence in the Parametric Emission from Photonic Wires.pdf
- Absence of long-range NiMn ordering in ferromagnetic La2NiMnO6 thin films.pdf
- Absence of Metal-Insulator-Transition and Coherent Interlayer Transport in oriented graphit.pdf
- Absence of Metastable States in Strained Monatomic Cubic Crystals.pdf
- Absence of re-entrant phase transition of the antiferromagnetic Ising model on the simple c.pdf
- Absence of Replica Symmetry Breaking in a Region of the Phase Diagram of the Ising Spin Gla.pdf
- Absence of MutY homologue mutation in patients with multiple sporadic adenomatous polyps in Korea.pdf
- Absence of saturation for finite injected currents in axially symmetric cavity diode.pdf
- Absence of Phase Transition for Antiferromagnetic Potts Models via the Dobrushin Uniqueness.pdf
- Absence of vortex condensation in a two dimensional fermionic XY model.pdf
文档评论(0)