2024年3月14日发(作者:笃绿凝)
文库 / 手机文库 /
nVIDIA Hardware:NVIDIA硬件
NVIDIA HardwareNVIDIA HardwareKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 1Karl
HilleslandKarl HilleslandNovember 2, 2000November 2, 2000
Cards discussedCards discussed? ? Major release in fall, improvement in springMajor release
in fall, improvement in spring? ? NV10: GeForce 256 (Fall 1999)NV10: GeForce 256 (Fall
1999)NV15 G FNV15 G F? ? NV15: GeForce2 GTS (Spring 2000) NV15: GeForce2 GTS (Spring 2000)
2 GTS (S2 GTS (Sii2000)2000)Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 2? ? NV25?:
XNV25?: X- -Box (Fall 2001) Box (Fall 2001) ? ? NV11: GeForce2 MX (Summer 2000)NV11: GeForce2
MX (Summer 2000)? ? NV16: GeForce2 Ultra (Fall 2000)NV16: GeForce2 Ultra (Fall 2000)? ?
NV20: (Anandtech: Dec 2000 NV20: (Anandtech: Dec 2000 - - April 2001)April 2001)
GeForce 256GeForce 256? ? 0.22um, 23 M transistors0.22um, 23 M transistors? ? 120 MHz core120
MHz core? ? 128 bit 166 MHz SDR or 150 MHz DDR up to 128 MB128 bit 166 MHz SDR or 150 MHz
DDR up to 128 MB128 bit, 166 MHz SDR or 150 MHz DDR, up to 128 MB 128 bit, 166 MHz SDR or
150 MHz DDR, up to 128 MB (64 MB biggest I’ve ever heard of)(64 MB biggest I’ve ever heard
of)Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 3? ? AGP 4x with fast writesAGP 4x with
fast writes? ? 350 MHz RAMDAC350 MHz RAMDAC? ? DVDDVD? ? TVTV- -outout
GeForce 256 TrianglesGeForce 256 Triangles? ? 15 MTris/s (BenMark5 gives 13M. Have seen
15 MTris/s (BenMark5 gives 13M. Have seen other references to 14.5M) other references to
14.5M) ? ? Up to 6 triangles “inUp to 6 triangles “in- -flight” at a timeflight” at a
timeKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 4? ? 2 matrix Vertex skinning2 matrix
Vertex skinning? ? Texture coordinate generation (+emboss, Texture coordinate generation
(+emboss, reflection, cube map)reflection, cube map)? ? 8 lights8 lights
BenMark5BenMark5NV10: 13 MTris/s, NV15: 24 MTris/sNV10: 13 MTris/s, NV15: 24 MTris/sKarl
Hillesland - NVIDIA Hardware - 11/2 - Slide 5
Transform EngineLightingEngineSetupEngineRenderingEngineFour Independent Pipelined
EnginesFour Independent Pipelined EnginesIndustry-leading 3D performance15-25M
triangles/secondQuadEngineTMArchitecture (from summer 99 notes)Karl Hillesland - NVIDIA
Hardware - 11/2 - Slide 6Sustained DMA, transform/clip/light, setup, rasterize and render
rateExtremely efficient>70% of the chip active at all timesUp to 6 triangles “in flight”
at a timeSuper-pipelined designVery low latency between engines
GeForce 256 pixels/texelsGeForce 256 pixels/texels? ? 4 pixel pipes, one texture each. Can
do 24 pixel pipes, one texture each. Can do 2- -texture multitexturing by coupling
pipestexturing by coupling pipestexture multi- -? ? 24/8 bit Z/stencil, 32 bit color (note:
4*(24+8+32)=256)24/8 bit Z/stencil, 32 bit color (note: 4*(24+8+32)=256)? ? Register
CombinersRegister Combinersg gKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 7? ? Texture
文库 / 手机文库 /
文库 / 手机文库 /
CompressionTexture Compression? ? 8 8- -tap anisotropic filteringtap anisotropic filtering? ?
range based fogrange based fog? ? antianti- -aliasing(?)aliasing(?)
GeForce 256 GeForce 256 - -> GeForce2 GTS> GeForce2 GTS? ? 2 textures per pipe2 textures
per pipe? ? 25M Transistors 25M Transistors 0 18 Mi0 18 Mi? ? 0.18 Micron technology0.18
Micron technologyhhllKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 8? ? 200 MHz core clock,
166 MHz DDR (“333” 200 MHz core clock, 166 MHz DDR (“333” MHz)MHz)? ? 25M Tris/s (BenMark5
gives 24M Tris/s)25M Tris/s (BenMark5 gives 24M Tris/s)? ? Flat panelFlat panel
GeForce2 GTS GeForce2 GTS ?? GeForce2 MXGeForce2 MX? ? Remove two pixel pipes (left with
2, 2 textures each)Remove two pixel pipes (left with 2, 2 textures each)? ? Dual head
supportDual head support? ? “Digital Vibrance Control”“Digital Vibrance Control”? ?
Digital Vibrance ControlDigital Vibrance ControlKarl Hillesland - NVIDIA Hardware - 11/2
- Slide 9? ? Low power and heat Low power and heat ? ? Slower Core Clock (175 MHz)Slower
Core Clock (175 MHz)? ? Either 64 or 128 bit memory possibleEither 64 or 128 bit memory
possible? ? Cheaper: (intended for ~ $100 range)Cheaper: (intended for ~ $100 range)
GeForce2 GTS GeForce2 GTS ?? GeForce2 UltraGeForce2 Ultra? ? Faster core clock: 250 MHzFaster
core clock: 250 MHz? ? Faster memory: 225 MHz DDR ( “450” MHz)Faster memory: 225 MHz DDR
( “450” MHz)EEExpensive: ~ $500ii$500$500Karl Hillesland - NVIDIA Hardware - 11/2 - Slide
10? ? Expensive: ~ $500
GeForce GeForce ?? QuadroQuadro? ? Increased clock ratesIncreased clock rates? ?
Acceleration of some common CADAcceleration of some common CAD- -oriented features (.e.g,
antifeatures (.e.g, anti- -aliased lines)aliased lines)features (.e.g, antifeatures (.e.g,
anti aliased lines)aliased lines)oriented Karl Hillesland - NVIDIA Hardware - 11/2 - Slide
11
BandwidthsBandwidths? ? AGP 4x : 1.2 GB/sAGP 4x : 1.2 GB/s? ? Video memory: 333 MHz * 128
bits = 5.3 GB/sVideo memory: 333 MHz * 128 bits = 5.3 GB/sPCI 132 MB/PCI 132 MB/? ? PCI:
132 MB/s PCI: 132 MB/s Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 12? ? Host: PC100
with SDRAM = 1.6 GB/sHost: PC100 with SDRAM = 1.6 GB/s
Vertex BandwidthVertex Bandwidth? ? Q3 Q3 - -> 18 bytes per vertex> 18 bytes per vertex–
–position 2 * 3 = 6 bytesposition 2 * 3 = 6 bytes– –texture coords, 2 textures: 2 *
2 * 2 = 8 bytestexture coords, 2 textures: 2 * 2 * 2 = 8 bytestexture coords, 2 textures:
2 2 2 8 bytestexture coords, 2 textures: 2 2 2 8 bytesKarl Hillesland - NVIDIA
Hardware - 11/2 - Slide 13– –color: 4 bytescolor: 4 bytes? ? The double eagle: 10/16 bytes
per vertexThe double eagle: 10/16 bytes per vertex– –position 2 * 3 = 6 bytesposition 2
* 3 = 6 bytes– –color: 4 bytes color: 4 bytes
Vertex Bandwidth, Q3Vertex Bandwidth, Q3? ? AGP 4x : 1.2 GB/s / 18 = 67 M Verts/sAGP 4x :
1.2 GB/s / 18 = 67 M Verts/s? ? Video memory: 5.3 GB/s / 18 = 294 M Verts/sVideo memory:
5.3 GB/s / 18 = 294 M Verts/sPCI 132 MB/ / 18PCI 132 MB/ / 18? ? PCI: 132 MB/s / 18 = 7.3
M Verts/sPCI: 132 MB/s / 18 = 7.3 M Verts/s7 3 M V7 3 M V//Karl Hillesland - NVIDIA Hardware
文库 / 手机文库 /
文库 / 手机文库 /
- 11/2 - Slide 14? ? Host: PC100 with SDRAM: 1.6 GB/s / 18 = Host: PC100 with SDRAM: 1.6
GB/s / 18 = 88 M Verts/s88 M Verts/s
Add indicesAdd indices? ? Assume “perfect strips” (one new vertex for each Assume “perfect
strips” (one new vertex for each triangle)triangle)? ? Each triangleEach triangle - -> 3
indices, 1 new vertex> 3 indices, 1 new vertexEach triangle Each triangle > 3 indices, 1
new vertex> 3 indices, 1 new vertexKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 15? ?
18 + 2 bytes/index * 3 indicies/tri = 20 bytes/tri18 + 2 bytes/index * 3 indicies/tri = 20
bytes/tri? ? indicies and verticies may come across different indicies and verticies may
come across different bussesbusses? ? Vertex cache can save some bandwidthVertex cache can
save some bandwidth
Texture CompositingTexture CompositingTextureEnvironment0TextureEnvironmentEnvironment1
1TextureTextureFetchingFetchingTex0Fragment ColorKarl Hillesland - NVIDIA Hardware - 11/2
- Slide 16SpecularColorSumSumSpecularColorFogApplicationTex1Fog Color/FactorSpecular
Color
Register CombinersRegister Combiners? ? Replaces blending of fragment, texture, fog, and
Replaces blending of fragment, texture, fog, and secondary ary colors.? ?
Provides configurable 8Provides configurable 8- -bit, signed math perProvides configurable
8Provides configurable 8 bit, signed math peroperationsoperationsbit, signed math per-
-pixelbit, signed math per pixel pixelpixel Karl Hillesland - NVIDIA Hardware - 11/2 - Slide
17? ? Cascading of register combiners for more Cascading of register combiners for more
sophisticated computations (Hardware limit on sophisticated computations (Hardware limit
on levels. Currently 2)levels. Currently 2)
Register CombinersRegister CombinersFragment ColorGeneralCombiner04 RGB InputsFog
Color/FactorSetRegister SetRegister Specular Color4 Alpha Inputs3 RGB Outputs3 Alpha
Outputs4 RGB Inputs4 RGB InputsKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 18Spare
0TextureFetchingTexture 0Texture 16 RGB InputsGeneralCombiner14 Alpha Inputs3 RGB Outputs3
Alpha OutputsFinalCombiner1 Alpha InputSpecular Color
Input/Output mappingsInput/Output mappings? ? Input mappingsInput mappings– –
InvertInvert– – NegateNegateKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 19g g– –
Bias by 1/2Bias by 1/2– – Expand by 2Expand by 2? ? Output mappings Output mappings –
– Bias by 1/2Bias by 1/2– – Scale by 1/2, 2 or 4Scale by 1/2, 2 or 4
General Combiner, RGBGeneral Combiner, RGBprimary colorsecondary colortexture 0A B + C DA
B + C D-or-ABCDinputmapinputmapinputmapRGBARGBAinput registersoutput
registersinputmapprimary colorsecondary colortexture 0Karl Hillesland - NVIDIA Hardware -
11/2 - Slide 20zeroconstant color 0constant color 1fogspare 1spare 0texture 1A B mux C DA
B-or-A ? BC ? DC D-or-not writeablecomputationsscaleandbiasnot readablezeroconstant color
0constant color 1fogspare 1spare 0texture 1
General Combiner, AlphaGeneral Combiner, Alphaprimary colorsecondary colortexture 0A B +
文库 / 手机文库 /
文库 / 手机文库 /
C DA B + C D-or-ABCDinputmapinputmapinputmapRGBARGBAinput registersoutput
registersinputmapprimary colorsecondary colortexture 0Karl Hillesland - NVIDIA Hardware -
11/2 - Slide 21zeroconstant color 0constant color 1fogspare 1spare 0texture 1A B mux C DA
BC Dnot writeablescaleandbiasnot readablezeroconstant color 0constant color 1fogspare
1spare 0texture 1
Final CombinerFinal Combinerprimary colorsecondary colortexture 0RGBAinput registersE
FEFinputmapinputmapKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 22zeroconstant color
0constant color 1fogspare 1spare 0texture 1ABCDA B + ( 1 - A) C + DGspare 0 +secondary
colorinputmapinputmapinputmapinputmapinputmapfragment RGB outfragment Alpha outG
X X- -Box (Abrash on Dr. Dobbs)Box (Abrash on Dr. Dobbs)? ? Intel PIII/733 with 238 KB
cacheIntel PIII/733 with 238 KB cache? ? 250250- -300 MHz Core300 MHz CoreDVD h d di kDVD
h d di k? ? DVD, hard diskDVD, hard diskKarl Hillesland - NVIDIA Hardware - 11/2 - Slide
23? ? custom sound with 64 3Dcustom sound with 64 3D- -audio channelsaudio channels
X X- -Box Transform/lightingBox Transform/lighting? ? 125 M Tris gouraud, transformed,
shaded, two textures. 125 M Tris gouraud, transformed, shaded, two textures. ? ? +one
infinite light, 62.45 MTris/sec, +one infinite light, 62.45 MTris/sec, ? ? 8 local lights
8 MTris/sec8 local lights 8 MTris/sec8 local lights 8 MTris/sec8 local lights 8 MTris/secKarl
Hillesland - NVIDIA Hardware - 11/2 - Slide 24? ? 125 M particles/s (single color front125
M particles/s (single color front- -facing squares)facing squares)? ? Vertex ProgramsVertex
Programs? ? Surface engine “works with CPU” for CatmullSurface engine “works with CPU”
for Catmull- -Clark, Bezier, Loop, and uniform BBezier, Loop, and uniform B- -splines at
50Mtris/secClark, splines at 50Mtris/sec
Vertex ProgramsVertex Programs? ? Replaces transformation and lightingReplaces
transformation and lighting? ? Custom vertex lightingCustom vertex lightingCCCustom
skinning and blendingkikiiid bld bldidiKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 25? ?
Custom skinning and blending? ? Custom texture coordinate generationCustom texture
coordinate generation? ? Custom matrix operationsCustom matrix operations? ? Custom vertex
computations of your choiceCustom vertex computations of your choice
Vertex ProgramsVertex Programs? ? Input is untransformed, unlit vertexInput is untransformed,
unlit vertex? ? Create a transformed vertexCreate a transformed vertexO iO iOptionally
computellllKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 26? ? Optionally compute–
–lightinglighting– –texture coordinatestexture coordinates– –fog coordinatesfog
coordinates– –point sizespoint sizes
Vertex Programs Programs cont.? ? Does 4Does 4- -vector fixed point mathvector
fixed point math? ? 17 Instructions:17 Instructions:ARL MOV MUL ADD MAD RCP RSQARL MOV MUL
ADD MAD RCP RSQ– –ARL, MOV, MUL, ADD, MAD, RCP, RSQ, ARL, MOV, MUL, ADD, MAD, RCP, RSQ,
DP3, DP4, DST, MIN, MAX, SLT, SGE, EXP, DP3, DP4, DST, MIN, MAX, SLT, SGE, EXP, LOG, LITLOG,
LITKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 27
文库 / 手机文库 /
文库 / 手机文库 /
Vertex Program RegistersVertex Program Registers16x4 Vertex Attribute Registers96x4 Program
Parameters(e.g, modelview (e g,odeprojection matrix)Karl Hillesland - NVIDIA Hardware - 11/2
- Slide 28Vertex Program128 instructions15x4 Vertex Result Registerse12x4 Temporary
registers
Using Vertex Programs (OpenGL)Using Vertex Programs (OpenGL)? ? Programs are arrays of
GLubytes(“strings”)Programs are arrays of GLubytes(“strings”)? ? Created/managed
similar to texture objectsCreated/managed similar to texture objectsNNNo penalty for
switching in and out of vertex program modeprogram modellffi hii hiiiddffKarl Hillesland
- NVIDIA Hardware - 11/2 - Slide 29? ? No penalty for switching in and out of vertex ? ?
execution time ~proportional to length of programexecution time ~proportional to length of
program
X X- -Box memory bandwidthBox memory bandwidth? ? UMA with GPU in controlUMA with GPU in
control? ? 64 MB, 128 bit, 200 MHz DDR RAM64 MB, 128 bit, 200 MHz DDR RAM1 GPi /1 GPi /? ?
1 GPix/sec fill rate + “occlusion circuitry”1 GPix/sec fill rate + “occlusion
circuitry”fillfill““lliiiiii””Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 30? ?
“automatic z compression”“automatic z compression”
X X- -Box bandwidth diagramBox bandwidth diagramKarl Hillesland - NVIDIA Hardware - 11/2
- Slide 31
X X- -Box TexturesBox Textures? ? 4 textures per pixel (but takes two clocks for >2)4 textures
per pixel (but takes two clocks for >2)? ? One texture can be used as lookup to next textureOne
texture can be used as lookup to next texture888 general register combiners + final
combinerlliibibififillbibiKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 32? ? 8 general
register combiners + final combiner? ? 3D Textures3D Textures? ? Cube maps, compression,
maps, compression, etc.? ? 2 or 4 sample anti2 or 4 sample anti- -aliasingaliasing
Texture compression (OpenGL)Texture compression (OpenGL)? ? DXTC/S3TC DXTC/S3TC –
–PrePre- -compressed (DDS file)compressed (DDS file)CCCompressed by driverd b d id b d iKarl
Hillesland - NVIDIA Hardware - 11/2 - Slide 33– –Compressed by driver? ? DXT1/S3TC, DXT3,
DXT5 (not DXT2, DXT4)DXT1/S3TC, DXT3, DXT5 (not DXT2, DXT4)? ? Ugly (be careful of trickery
though)Ugly (be careful of trickery though)
文库 / 手机文库 /
2024年3月14日发(作者:笃绿凝)
文库 / 手机文库 /
nVIDIA Hardware:NVIDIA硬件
NVIDIA HardwareNVIDIA HardwareKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 1Karl
HilleslandKarl HilleslandNovember 2, 2000November 2, 2000
Cards discussedCards discussed? ? Major release in fall, improvement in springMajor release
in fall, improvement in spring? ? NV10: GeForce 256 (Fall 1999)NV10: GeForce 256 (Fall
1999)NV15 G FNV15 G F? ? NV15: GeForce2 GTS (Spring 2000) NV15: GeForce2 GTS (Spring 2000)
2 GTS (S2 GTS (Sii2000)2000)Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 2? ? NV25?:
XNV25?: X- -Box (Fall 2001) Box (Fall 2001) ? ? NV11: GeForce2 MX (Summer 2000)NV11: GeForce2
MX (Summer 2000)? ? NV16: GeForce2 Ultra (Fall 2000)NV16: GeForce2 Ultra (Fall 2000)? ?
NV20: (Anandtech: Dec 2000 NV20: (Anandtech: Dec 2000 - - April 2001)April 2001)
GeForce 256GeForce 256? ? 0.22um, 23 M transistors0.22um, 23 M transistors? ? 120 MHz core120
MHz core? ? 128 bit 166 MHz SDR or 150 MHz DDR up to 128 MB128 bit 166 MHz SDR or 150 MHz
DDR up to 128 MB128 bit, 166 MHz SDR or 150 MHz DDR, up to 128 MB 128 bit, 166 MHz SDR or
150 MHz DDR, up to 128 MB (64 MB biggest I’ve ever heard of)(64 MB biggest I’ve ever heard
of)Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 3? ? AGP 4x with fast writesAGP 4x with
fast writes? ? 350 MHz RAMDAC350 MHz RAMDAC? ? DVDDVD? ? TVTV- -outout
GeForce 256 TrianglesGeForce 256 Triangles? ? 15 MTris/s (BenMark5 gives 13M. Have seen
15 MTris/s (BenMark5 gives 13M. Have seen other references to 14.5M) other references to
14.5M) ? ? Up to 6 triangles “inUp to 6 triangles “in- -flight” at a timeflight” at a
timeKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 4? ? 2 matrix Vertex skinning2 matrix
Vertex skinning? ? Texture coordinate generation (+emboss, Texture coordinate generation
(+emboss, reflection, cube map)reflection, cube map)? ? 8 lights8 lights
BenMark5BenMark5NV10: 13 MTris/s, NV15: 24 MTris/sNV10: 13 MTris/s, NV15: 24 MTris/sKarl
Hillesland - NVIDIA Hardware - 11/2 - Slide 5
Transform EngineLightingEngineSetupEngineRenderingEngineFour Independent Pipelined
EnginesFour Independent Pipelined EnginesIndustry-leading 3D performance15-25M
triangles/secondQuadEngineTMArchitecture (from summer 99 notes)Karl Hillesland - NVIDIA
Hardware - 11/2 - Slide 6Sustained DMA, transform/clip/light, setup, rasterize and render
rateExtremely efficient>70% of the chip active at all timesUp to 6 triangles “in flight”
at a timeSuper-pipelined designVery low latency between engines
GeForce 256 pixels/texelsGeForce 256 pixels/texels? ? 4 pixel pipes, one texture each. Can
do 24 pixel pipes, one texture each. Can do 2- -texture multitexturing by coupling
pipestexturing by coupling pipestexture multi- -? ? 24/8 bit Z/stencil, 32 bit color (note:
4*(24+8+32)=256)24/8 bit Z/stencil, 32 bit color (note: 4*(24+8+32)=256)? ? Register
CombinersRegister Combinersg gKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 7? ? Texture
文库 / 手机文库 /
文库 / 手机文库 /
CompressionTexture Compression? ? 8 8- -tap anisotropic filteringtap anisotropic filtering? ?
range based fogrange based fog? ? antianti- -aliasing(?)aliasing(?)
GeForce 256 GeForce 256 - -> GeForce2 GTS> GeForce2 GTS? ? 2 textures per pipe2 textures
per pipe? ? 25M Transistors 25M Transistors 0 18 Mi0 18 Mi? ? 0.18 Micron technology0.18
Micron technologyhhllKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 8? ? 200 MHz core clock,
166 MHz DDR (“333” 200 MHz core clock, 166 MHz DDR (“333” MHz)MHz)? ? 25M Tris/s (BenMark5
gives 24M Tris/s)25M Tris/s (BenMark5 gives 24M Tris/s)? ? Flat panelFlat panel
GeForce2 GTS GeForce2 GTS ?? GeForce2 MXGeForce2 MX? ? Remove two pixel pipes (left with
2, 2 textures each)Remove two pixel pipes (left with 2, 2 textures each)? ? Dual head
supportDual head support? ? “Digital Vibrance Control”“Digital Vibrance Control”? ?
Digital Vibrance ControlDigital Vibrance ControlKarl Hillesland - NVIDIA Hardware - 11/2
- Slide 9? ? Low power and heat Low power and heat ? ? Slower Core Clock (175 MHz)Slower
Core Clock (175 MHz)? ? Either 64 or 128 bit memory possibleEither 64 or 128 bit memory
possible? ? Cheaper: (intended for ~ $100 range)Cheaper: (intended for ~ $100 range)
GeForce2 GTS GeForce2 GTS ?? GeForce2 UltraGeForce2 Ultra? ? Faster core clock: 250 MHzFaster
core clock: 250 MHz? ? Faster memory: 225 MHz DDR ( “450” MHz)Faster memory: 225 MHz DDR
( “450” MHz)EEExpensive: ~ $500ii$500$500Karl Hillesland - NVIDIA Hardware - 11/2 - Slide
10? ? Expensive: ~ $500
GeForce GeForce ?? QuadroQuadro? ? Increased clock ratesIncreased clock rates? ?
Acceleration of some common CADAcceleration of some common CAD- -oriented features (.e.g,
antifeatures (.e.g, anti- -aliased lines)aliased lines)features (.e.g, antifeatures (.e.g,
anti aliased lines)aliased lines)oriented Karl Hillesland - NVIDIA Hardware - 11/2 - Slide
11
BandwidthsBandwidths? ? AGP 4x : 1.2 GB/sAGP 4x : 1.2 GB/s? ? Video memory: 333 MHz * 128
bits = 5.3 GB/sVideo memory: 333 MHz * 128 bits = 5.3 GB/sPCI 132 MB/PCI 132 MB/? ? PCI:
132 MB/s PCI: 132 MB/s Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 12? ? Host: PC100
with SDRAM = 1.6 GB/sHost: PC100 with SDRAM = 1.6 GB/s
Vertex BandwidthVertex Bandwidth? ? Q3 Q3 - -> 18 bytes per vertex> 18 bytes per vertex–
–position 2 * 3 = 6 bytesposition 2 * 3 = 6 bytes– –texture coords, 2 textures: 2 *
2 * 2 = 8 bytestexture coords, 2 textures: 2 * 2 * 2 = 8 bytestexture coords, 2 textures:
2 2 2 8 bytestexture coords, 2 textures: 2 2 2 8 bytesKarl Hillesland - NVIDIA
Hardware - 11/2 - Slide 13– –color: 4 bytescolor: 4 bytes? ? The double eagle: 10/16 bytes
per vertexThe double eagle: 10/16 bytes per vertex– –position 2 * 3 = 6 bytesposition 2
* 3 = 6 bytes– –color: 4 bytes color: 4 bytes
Vertex Bandwidth, Q3Vertex Bandwidth, Q3? ? AGP 4x : 1.2 GB/s / 18 = 67 M Verts/sAGP 4x :
1.2 GB/s / 18 = 67 M Verts/s? ? Video memory: 5.3 GB/s / 18 = 294 M Verts/sVideo memory:
5.3 GB/s / 18 = 294 M Verts/sPCI 132 MB/ / 18PCI 132 MB/ / 18? ? PCI: 132 MB/s / 18 = 7.3
M Verts/sPCI: 132 MB/s / 18 = 7.3 M Verts/s7 3 M V7 3 M V//Karl Hillesland - NVIDIA Hardware
文库 / 手机文库 /
文库 / 手机文库 /
- 11/2 - Slide 14? ? Host: PC100 with SDRAM: 1.6 GB/s / 18 = Host: PC100 with SDRAM: 1.6
GB/s / 18 = 88 M Verts/s88 M Verts/s
Add indicesAdd indices? ? Assume “perfect strips” (one new vertex for each Assume “perfect
strips” (one new vertex for each triangle)triangle)? ? Each triangleEach triangle - -> 3
indices, 1 new vertex> 3 indices, 1 new vertexEach triangle Each triangle > 3 indices, 1
new vertex> 3 indices, 1 new vertexKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 15? ?
18 + 2 bytes/index * 3 indicies/tri = 20 bytes/tri18 + 2 bytes/index * 3 indicies/tri = 20
bytes/tri? ? indicies and verticies may come across different indicies and verticies may
come across different bussesbusses? ? Vertex cache can save some bandwidthVertex cache can
save some bandwidth
Texture CompositingTexture CompositingTextureEnvironment0TextureEnvironmentEnvironment1
1TextureTextureFetchingFetchingTex0Fragment ColorKarl Hillesland - NVIDIA Hardware - 11/2
- Slide 16SpecularColorSumSumSpecularColorFogApplicationTex1Fog Color/FactorSpecular
Color
Register CombinersRegister Combiners? ? Replaces blending of fragment, texture, fog, and
Replaces blending of fragment, texture, fog, and secondary ary colors.? ?
Provides configurable 8Provides configurable 8- -bit, signed math perProvides configurable
8Provides configurable 8 bit, signed math peroperationsoperationsbit, signed math per-
-pixelbit, signed math per pixel pixelpixel Karl Hillesland - NVIDIA Hardware - 11/2 - Slide
17? ? Cascading of register combiners for more Cascading of register combiners for more
sophisticated computations (Hardware limit on sophisticated computations (Hardware limit
on levels. Currently 2)levels. Currently 2)
Register CombinersRegister CombinersFragment ColorGeneralCombiner04 RGB InputsFog
Color/FactorSetRegister SetRegister Specular Color4 Alpha Inputs3 RGB Outputs3 Alpha
Outputs4 RGB Inputs4 RGB InputsKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 18Spare
0TextureFetchingTexture 0Texture 16 RGB InputsGeneralCombiner14 Alpha Inputs3 RGB Outputs3
Alpha OutputsFinalCombiner1 Alpha InputSpecular Color
Input/Output mappingsInput/Output mappings? ? Input mappingsInput mappings– –
InvertInvert– – NegateNegateKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 19g g– –
Bias by 1/2Bias by 1/2– – Expand by 2Expand by 2? ? Output mappings Output mappings –
– Bias by 1/2Bias by 1/2– – Scale by 1/2, 2 or 4Scale by 1/2, 2 or 4
General Combiner, RGBGeneral Combiner, RGBprimary colorsecondary colortexture 0A B + C DA
B + C D-or-ABCDinputmapinputmapinputmapRGBARGBAinput registersoutput
registersinputmapprimary colorsecondary colortexture 0Karl Hillesland - NVIDIA Hardware -
11/2 - Slide 20zeroconstant color 0constant color 1fogspare 1spare 0texture 1A B mux C DA
B-or-A ? BC ? DC D-or-not writeablecomputationsscaleandbiasnot readablezeroconstant color
0constant color 1fogspare 1spare 0texture 1
General Combiner, AlphaGeneral Combiner, Alphaprimary colorsecondary colortexture 0A B +
文库 / 手机文库 /
文库 / 手机文库 /
C DA B + C D-or-ABCDinputmapinputmapinputmapRGBARGBAinput registersoutput
registersinputmapprimary colorsecondary colortexture 0Karl Hillesland - NVIDIA Hardware -
11/2 - Slide 21zeroconstant color 0constant color 1fogspare 1spare 0texture 1A B mux C DA
BC Dnot writeablescaleandbiasnot readablezeroconstant color 0constant color 1fogspare
1spare 0texture 1
Final CombinerFinal Combinerprimary colorsecondary colortexture 0RGBAinput registersE
FEFinputmapinputmapKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 22zeroconstant color
0constant color 1fogspare 1spare 0texture 1ABCDA B + ( 1 - A) C + DGspare 0 +secondary
colorinputmapinputmapinputmapinputmapinputmapfragment RGB outfragment Alpha outG
X X- -Box (Abrash on Dr. Dobbs)Box (Abrash on Dr. Dobbs)? ? Intel PIII/733 with 238 KB
cacheIntel PIII/733 with 238 KB cache? ? 250250- -300 MHz Core300 MHz CoreDVD h d di kDVD
h d di k? ? DVD, hard diskDVD, hard diskKarl Hillesland - NVIDIA Hardware - 11/2 - Slide
23? ? custom sound with 64 3Dcustom sound with 64 3D- -audio channelsaudio channels
X X- -Box Transform/lightingBox Transform/lighting? ? 125 M Tris gouraud, transformed,
shaded, two textures. 125 M Tris gouraud, transformed, shaded, two textures. ? ? +one
infinite light, 62.45 MTris/sec, +one infinite light, 62.45 MTris/sec, ? ? 8 local lights
8 MTris/sec8 local lights 8 MTris/sec8 local lights 8 MTris/sec8 local lights 8 MTris/secKarl
Hillesland - NVIDIA Hardware - 11/2 - Slide 24? ? 125 M particles/s (single color front125
M particles/s (single color front- -facing squares)facing squares)? ? Vertex ProgramsVertex
Programs? ? Surface engine “works with CPU” for CatmullSurface engine “works with CPU”
for Catmull- -Clark, Bezier, Loop, and uniform BBezier, Loop, and uniform B- -splines at
50Mtris/secClark, splines at 50Mtris/sec
Vertex ProgramsVertex Programs? ? Replaces transformation and lightingReplaces
transformation and lighting? ? Custom vertex lightingCustom vertex lightingCCCustom
skinning and blendingkikiiid bld bldidiKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 25? ?
Custom skinning and blending? ? Custom texture coordinate generationCustom texture
coordinate generation? ? Custom matrix operationsCustom matrix operations? ? Custom vertex
computations of your choiceCustom vertex computations of your choice
Vertex ProgramsVertex Programs? ? Input is untransformed, unlit vertexInput is untransformed,
unlit vertex? ? Create a transformed vertexCreate a transformed vertexO iO iOptionally
computellllKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 26? ? Optionally compute–
–lightinglighting– –texture coordinatestexture coordinates– –fog coordinatesfog
coordinates– –point sizespoint sizes
Vertex Programs Programs cont.? ? Does 4Does 4- -vector fixed point mathvector
fixed point math? ? 17 Instructions:17 Instructions:ARL MOV MUL ADD MAD RCP RSQARL MOV MUL
ADD MAD RCP RSQ– –ARL, MOV, MUL, ADD, MAD, RCP, RSQ, ARL, MOV, MUL, ADD, MAD, RCP, RSQ,
DP3, DP4, DST, MIN, MAX, SLT, SGE, EXP, DP3, DP4, DST, MIN, MAX, SLT, SGE, EXP, LOG, LITLOG,
LITKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 27
文库 / 手机文库 /
文库 / 手机文库 /
Vertex Program RegistersVertex Program Registers16x4 Vertex Attribute Registers96x4 Program
Parameters(e.g, modelview (e g,odeprojection matrix)Karl Hillesland - NVIDIA Hardware - 11/2
- Slide 28Vertex Program128 instructions15x4 Vertex Result Registerse12x4 Temporary
registers
Using Vertex Programs (OpenGL)Using Vertex Programs (OpenGL)? ? Programs are arrays of
GLubytes(“strings”)Programs are arrays of GLubytes(“strings”)? ? Created/managed
similar to texture objectsCreated/managed similar to texture objectsNNNo penalty for
switching in and out of vertex program modeprogram modellffi hii hiiiddffKarl Hillesland
- NVIDIA Hardware - 11/2 - Slide 29? ? No penalty for switching in and out of vertex ? ?
execution time ~proportional to length of programexecution time ~proportional to length of
program
X X- -Box memory bandwidthBox memory bandwidth? ? UMA with GPU in controlUMA with GPU in
control? ? 64 MB, 128 bit, 200 MHz DDR RAM64 MB, 128 bit, 200 MHz DDR RAM1 GPi /1 GPi /? ?
1 GPix/sec fill rate + “occlusion circuitry”1 GPix/sec fill rate + “occlusion
circuitry”fillfill““lliiiiii””Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 30? ?
“automatic z compression”“automatic z compression”
X X- -Box bandwidth diagramBox bandwidth diagramKarl Hillesland - NVIDIA Hardware - 11/2
- Slide 31
X X- -Box TexturesBox Textures? ? 4 textures per pixel (but takes two clocks for >2)4 textures
per pixel (but takes two clocks for >2)? ? One texture can be used as lookup to next textureOne
texture can be used as lookup to next texture888 general register combiners + final
combinerlliibibififillbibiKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 32? ? 8 general
register combiners + final combiner? ? 3D Textures3D Textures? ? Cube maps, compression,
maps, compression, etc.? ? 2 or 4 sample anti2 or 4 sample anti- -aliasingaliasing
Texture compression (OpenGL)Texture compression (OpenGL)? ? DXTC/S3TC DXTC/S3TC –
–PrePre- -compressed (DDS file)compressed (DDS file)CCCompressed by driverd b d id b d iKarl
Hillesland - NVIDIA Hardware - 11/2 - Slide 33– –Compressed by driver? ? DXT1/S3TC, DXT3,
DXT5 (not DXT2, DXT4)DXT1/S3TC, DXT3, DXT5 (not DXT2, DXT4)? ? Ugly (be careful of trickery
though)Ugly (be careful of trickery though)
文库 / 手机文库 /