最新消息: USBMI致力于为网友们分享Windows、安卓、IOS等主流手机系统相关的资讯以及评测、同时提供相关教程、应用、软件下载等服务。

nvidiahardware:nvidia硬件

IT圈 admin 26浏览 0评论

2024年3月14日发(作者:笃绿凝)

文库 / 手机文库 /

nVIDIA Hardware:NVIDIA硬件

NVIDIA HardwareNVIDIA HardwareKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 1Karl

HilleslandKarl HilleslandNovember 2, 2000November 2, 2000

Cards discussedCards discussed? ? Major release in fall, improvement in springMajor release

in fall, improvement in spring? ? NV10: GeForce 256 (Fall 1999)NV10: GeForce 256 (Fall

1999)NV15 G FNV15 G F? ? NV15: GeForce2 GTS (Spring 2000) NV15: GeForce2 GTS (Spring 2000)

2 GTS (S2 GTS (Sii2000)2000)Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 2? ? NV25?:

XNV25?: X- -Box (Fall 2001) Box (Fall 2001) ? ? NV11: GeForce2 MX (Summer 2000)NV11: GeForce2

MX (Summer 2000)? ? NV16: GeForce2 Ultra (Fall 2000)NV16: GeForce2 Ultra (Fall 2000)? ?

NV20: (Anandtech: Dec 2000 NV20: (Anandtech: Dec 2000 - - April 2001)April 2001)

GeForce 256GeForce 256? ? 0.22um, 23 M transistors0.22um, 23 M transistors? ? 120 MHz core120

MHz core? ? 128 bit 166 MHz SDR or 150 MHz DDR up to 128 MB128 bit 166 MHz SDR or 150 MHz

DDR up to 128 MB128 bit, 166 MHz SDR or 150 MHz DDR, up to 128 MB 128 bit, 166 MHz SDR or

150 MHz DDR, up to 128 MB (64 MB biggest I’ve ever heard of)(64 MB biggest I’ve ever heard

of)Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 3? ? AGP 4x with fast writesAGP 4x with

fast writes? ? 350 MHz RAMDAC350 MHz RAMDAC? ? DVDDVD? ? TVTV- -outout

GeForce 256 TrianglesGeForce 256 Triangles? ? 15 MTris/s (BenMark5 gives 13M. Have seen

15 MTris/s (BenMark5 gives 13M. Have seen other references to 14.5M) other references to

14.5M) ? ? Up to 6 triangles “inUp to 6 triangles “in- -flight” at a timeflight” at a

timeKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 4? ? 2 matrix Vertex skinning2 matrix

Vertex skinning? ? Texture coordinate generation (+emboss, Texture coordinate generation

(+emboss, reflection, cube map)reflection, cube map)? ? 8 lights8 lights

BenMark5BenMark5NV10: 13 MTris/s, NV15: 24 MTris/sNV10: 13 MTris/s, NV15: 24 MTris/sKarl

Hillesland - NVIDIA Hardware - 11/2 - Slide 5

Transform EngineLightingEngineSetupEngineRenderingEngineFour Independent Pipelined

EnginesFour Independent Pipelined EnginesIndustry-leading 3D performance15-25M

triangles/secondQuadEngineTMArchitecture (from summer 99 notes)Karl Hillesland - NVIDIA

Hardware - 11/2 - Slide 6Sustained DMA, transform/clip/light, setup, rasterize and render

rateExtremely efficient>70% of the chip active at all timesUp to 6 triangles “in flight”

at a timeSuper-pipelined designVery low latency between engines

GeForce 256 pixels/texelsGeForce 256 pixels/texels? ? 4 pixel pipes, one texture each. Can

do 24 pixel pipes, one texture each. Can do 2- -texture multitexturing by coupling

pipestexturing by coupling pipestexture multi- -? ? 24/8 bit Z/stencil, 32 bit color (note:

4*(24+8+32)=256)24/8 bit Z/stencil, 32 bit color (note: 4*(24+8+32)=256)? ? Register

CombinersRegister Combinersg gKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 7? ? Texture

文库 / 手机文库 /

文库 / 手机文库 /

CompressionTexture Compression? ? 8 8- -tap anisotropic filteringtap anisotropic filtering? ?

range based fogrange based fog? ? antianti- -aliasing(?)aliasing(?)

GeForce 256 GeForce 256 - -> GeForce2 GTS> GeForce2 GTS? ? 2 textures per pipe2 textures

per pipe? ? 25M Transistors 25M Transistors 0 18 Mi0 18 Mi? ? 0.18 Micron technology0.18

Micron technologyhhllKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 8? ? 200 MHz core clock,

166 MHz DDR (“333” 200 MHz core clock, 166 MHz DDR (“333” MHz)MHz)? ? 25M Tris/s (BenMark5

gives 24M Tris/s)25M Tris/s (BenMark5 gives 24M Tris/s)? ? Flat panelFlat panel

GeForce2 GTS GeForce2 GTS ?? GeForce2 MXGeForce2 MX? ? Remove two pixel pipes (left with

2, 2 textures each)Remove two pixel pipes (left with 2, 2 textures each)? ? Dual head

supportDual head support? ? “Digital Vibrance Control”“Digital Vibrance Control”? ?

Digital Vibrance ControlDigital Vibrance ControlKarl Hillesland - NVIDIA Hardware - 11/2

- Slide 9? ? Low power and heat Low power and heat ? ? Slower Core Clock (175 MHz)Slower

Core Clock (175 MHz)? ? Either 64 or 128 bit memory possibleEither 64 or 128 bit memory

possible? ? Cheaper: (intended for ~ $100 range)Cheaper: (intended for ~ $100 range)

GeForce2 GTS GeForce2 GTS ?? GeForce2 UltraGeForce2 Ultra? ? Faster core clock: 250 MHzFaster

core clock: 250 MHz? ? Faster memory: 225 MHz DDR ( “450” MHz)Faster memory: 225 MHz DDR

( “450” MHz)EEExpensive: ~ $500ii$500$500Karl Hillesland - NVIDIA Hardware - 11/2 - Slide

10? ? Expensive: ~ $500

GeForce GeForce ?? QuadroQuadro? ? Increased clock ratesIncreased clock rates? ?

Acceleration of some common CADAcceleration of some common CAD- -oriented features (.e.g,

antifeatures (.e.g, anti- -aliased lines)aliased lines)features (.e.g, antifeatures (.e.g,

anti aliased lines)aliased lines)oriented Karl Hillesland - NVIDIA Hardware - 11/2 - Slide

11

BandwidthsBandwidths? ? AGP 4x : 1.2 GB/sAGP 4x : 1.2 GB/s? ? Video memory: 333 MHz * 128

bits = 5.3 GB/sVideo memory: 333 MHz * 128 bits = 5.3 GB/sPCI 132 MB/PCI 132 MB/? ? PCI:

132 MB/s PCI: 132 MB/s Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 12? ? Host: PC100

with SDRAM = 1.6 GB/sHost: PC100 with SDRAM = 1.6 GB/s

Vertex BandwidthVertex Bandwidth? ? Q3 Q3 - -> 18 bytes per vertex> 18 bytes per vertex–

–position 2 * 3 = 6 bytesposition 2 * 3 = 6 bytes– –texture coords, 2 textures: 2 *

2 * 2 = 8 bytestexture coords, 2 textures: 2 * 2 * 2 = 8 bytestexture coords, 2 textures:

2 2 2 8 bytestexture coords, 2 textures: 2 2 2 8 bytesKarl Hillesland - NVIDIA

Hardware - 11/2 - Slide 13– –color: 4 bytescolor: 4 bytes? ? The double eagle: 10/16 bytes

per vertexThe double eagle: 10/16 bytes per vertex– –position 2 * 3 = 6 bytesposition 2

* 3 = 6 bytes– –color: 4 bytes color: 4 bytes

Vertex Bandwidth, Q3Vertex Bandwidth, Q3? ? AGP 4x : 1.2 GB/s / 18 = 67 M Verts/sAGP 4x :

1.2 GB/s / 18 = 67 M Verts/s? ? Video memory: 5.3 GB/s / 18 = 294 M Verts/sVideo memory:

5.3 GB/s / 18 = 294 M Verts/sPCI 132 MB/ / 18PCI 132 MB/ / 18? ? PCI: 132 MB/s / 18 = 7.3

M Verts/sPCI: 132 MB/s / 18 = 7.3 M Verts/s7 3 M V7 3 M V//Karl Hillesland - NVIDIA Hardware

文库 / 手机文库 /

文库 / 手机文库 /

- 11/2 - Slide 14? ? Host: PC100 with SDRAM: 1.6 GB/s / 18 = Host: PC100 with SDRAM: 1.6

GB/s / 18 = 88 M Verts/s88 M Verts/s

Add indicesAdd indices? ? Assume “perfect strips” (one new vertex for each Assume “perfect

strips” (one new vertex for each triangle)triangle)? ? Each triangleEach triangle - -> 3

indices, 1 new vertex> 3 indices, 1 new vertexEach triangle Each triangle > 3 indices, 1

new vertex> 3 indices, 1 new vertexKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 15? ?

18 + 2 bytes/index * 3 indicies/tri = 20 bytes/tri18 + 2 bytes/index * 3 indicies/tri = 20

bytes/tri? ? indicies and verticies may come across different indicies and verticies may

come across different bussesbusses? ? Vertex cache can save some bandwidthVertex cache can

save some bandwidth

Texture CompositingTexture CompositingTextureEnvironment0TextureEnvironmentEnvironment1

1TextureTextureFetchingFetchingTex0Fragment ColorKarl Hillesland - NVIDIA Hardware - 11/2

- Slide 16SpecularColorSumSumSpecularColorFogApplicationTex1Fog Color/FactorSpecular

Color

Register CombinersRegister Combiners? ? Replaces blending of fragment, texture, fog, and

Replaces blending of fragment, texture, fog, and secondary ary colors.? ?

Provides configurable 8Provides configurable 8- -bit, signed math perProvides configurable

8Provides configurable 8 bit, signed math peroperationsoperationsbit, signed math per-

-pixelbit, signed math per pixel pixelpixel Karl Hillesland - NVIDIA Hardware - 11/2 - Slide

17? ? Cascading of register combiners for more Cascading of register combiners for more

sophisticated computations (Hardware limit on sophisticated computations (Hardware limit

on levels. Currently 2)levels. Currently 2)

Register CombinersRegister CombinersFragment ColorGeneralCombiner04 RGB InputsFog

Color/FactorSetRegister SetRegister Specular Color4 Alpha Inputs3 RGB Outputs3 Alpha

Outputs4 RGB Inputs4 RGB InputsKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 18Spare

0TextureFetchingTexture 0Texture 16 RGB InputsGeneralCombiner14 Alpha Inputs3 RGB Outputs3

Alpha OutputsFinalCombiner1 Alpha InputSpecular Color

Input/Output mappingsInput/Output mappings? ? Input mappingsInput mappings– –

InvertInvert– – NegateNegateKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 19g g– –

Bias by 1/2Bias by 1/2– – Expand by 2Expand by 2? ? Output mappings Output mappings –

– Bias by 1/2Bias by 1/2– – Scale by 1/2, 2 or 4Scale by 1/2, 2 or 4

General Combiner, RGBGeneral Combiner, RGBprimary colorsecondary colortexture 0A B + C DA

B + C D-or-ABCDinputmapinputmapinputmapRGBARGBAinput registersoutput

registersinputmapprimary colorsecondary colortexture 0Karl Hillesland - NVIDIA Hardware -

11/2 - Slide 20zeroconstant color 0constant color 1fogspare 1spare 0texture 1A B mux C DA

B-or-A ? BC ? DC D-or-not writeablecomputationsscaleandbiasnot readablezeroconstant color

0constant color 1fogspare 1spare 0texture 1

General Combiner, AlphaGeneral Combiner, Alphaprimary colorsecondary colortexture 0A B +

文库 / 手机文库 /

文库 / 手机文库 /

C DA B + C D-or-ABCDinputmapinputmapinputmapRGBARGBAinput registersoutput

registersinputmapprimary colorsecondary colortexture 0Karl Hillesland - NVIDIA Hardware -

11/2 - Slide 21zeroconstant color 0constant color 1fogspare 1spare 0texture 1A B mux C DA

BC Dnot writeablescaleandbiasnot readablezeroconstant color 0constant color 1fogspare

1spare 0texture 1

Final CombinerFinal Combinerprimary colorsecondary colortexture 0RGBAinput registersE

FEFinputmapinputmapKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 22zeroconstant color

0constant color 1fogspare 1spare 0texture 1ABCDA B + ( 1 - A) C + DGspare 0 +secondary

colorinputmapinputmapinputmapinputmapinputmapfragment RGB outfragment Alpha outG

X X- -Box (Abrash on Dr. Dobbs)Box (Abrash on Dr. Dobbs)? ? Intel PIII/733 with 238 KB

cacheIntel PIII/733 with 238 KB cache? ? 250250- -300 MHz Core300 MHz CoreDVD h d di kDVD

h d di k? ? DVD, hard diskDVD, hard diskKarl Hillesland - NVIDIA Hardware - 11/2 - Slide

23? ? custom sound with 64 3Dcustom sound with 64 3D- -audio channelsaudio channels

X X- -Box Transform/lightingBox Transform/lighting? ? 125 M Tris gouraud, transformed,

shaded, two textures. 125 M Tris gouraud, transformed, shaded, two textures. ? ? +one

infinite light, 62.45 MTris/sec, +one infinite light, 62.45 MTris/sec, ? ? 8 local lights

8 MTris/sec8 local lights 8 MTris/sec8 local lights 8 MTris/sec8 local lights 8 MTris/secKarl

Hillesland - NVIDIA Hardware - 11/2 - Slide 24? ? 125 M particles/s (single color front125

M particles/s (single color front- -facing squares)facing squares)? ? Vertex ProgramsVertex

Programs? ? Surface engine “works with CPU” for CatmullSurface engine “works with CPU”

for Catmull- -Clark, Bezier, Loop, and uniform BBezier, Loop, and uniform B- -splines at

50Mtris/secClark, splines at 50Mtris/sec

Vertex ProgramsVertex Programs? ? Replaces transformation and lightingReplaces

transformation and lighting? ? Custom vertex lightingCustom vertex lightingCCCustom

skinning and blendingkikiiid bld bldidiKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 25? ?

Custom skinning and blending? ? Custom texture coordinate generationCustom texture

coordinate generation? ? Custom matrix operationsCustom matrix operations? ? Custom vertex

computations of your choiceCustom vertex computations of your choice

Vertex ProgramsVertex Programs? ? Input is untransformed, unlit vertexInput is untransformed,

unlit vertex? ? Create a transformed vertexCreate a transformed vertexO iO iOptionally

computellllKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 26? ? Optionally compute–

–lightinglighting– –texture coordinatestexture coordinates– –fog coordinatesfog

coordinates– –point sizespoint sizes

Vertex Programs Programs cont.? ? Does 4Does 4- -vector fixed point mathvector

fixed point math? ? 17 Instructions:17 Instructions:ARL MOV MUL ADD MAD RCP RSQARL MOV MUL

ADD MAD RCP RSQ– –ARL, MOV, MUL, ADD, MAD, RCP, RSQ, ARL, MOV, MUL, ADD, MAD, RCP, RSQ,

DP3, DP4, DST, MIN, MAX, SLT, SGE, EXP, DP3, DP4, DST, MIN, MAX, SLT, SGE, EXP, LOG, LITLOG,

LITKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 27

文库 / 手机文库 /

文库 / 手机文库 /

Vertex Program RegistersVertex Program Registers16x4 Vertex Attribute Registers96x4 Program

Parameters(e.g, modelview (e g,odeprojection matrix)Karl Hillesland - NVIDIA Hardware - 11/2

- Slide 28Vertex Program128 instructions15x4 Vertex Result Registerse12x4 Temporary

registers

Using Vertex Programs (OpenGL)Using Vertex Programs (OpenGL)? ? Programs are arrays of

GLubytes(“strings”)Programs are arrays of GLubytes(“strings”)? ? Created/managed

similar to texture objectsCreated/managed similar to texture objectsNNNo penalty for

switching in and out of vertex program modeprogram modellffi hii hiiiddffKarl Hillesland

- NVIDIA Hardware - 11/2 - Slide 29? ? No penalty for switching in and out of vertex ? ?

execution time ~proportional to length of programexecution time ~proportional to length of

program

X X- -Box memory bandwidthBox memory bandwidth? ? UMA with GPU in controlUMA with GPU in

control? ? 64 MB, 128 bit, 200 MHz DDR RAM64 MB, 128 bit, 200 MHz DDR RAM1 GPi /1 GPi /? ?

1 GPix/sec fill rate + “occlusion circuitry”1 GPix/sec fill rate + “occlusion

circuitry”fillfill““lliiiiii””Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 30? ?

“automatic z compression”“automatic z compression”

X X- -Box bandwidth diagramBox bandwidth diagramKarl Hillesland - NVIDIA Hardware - 11/2

- Slide 31

X X- -Box TexturesBox Textures? ? 4 textures per pixel (but takes two clocks for >2)4 textures

per pixel (but takes two clocks for >2)? ? One texture can be used as lookup to next textureOne

texture can be used as lookup to next texture888 general register combiners + final

combinerlliibibififillbibiKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 32? ? 8 general

register combiners + final combiner? ? 3D Textures3D Textures? ? Cube maps, compression,

maps, compression, etc.? ? 2 or 4 sample anti2 or 4 sample anti- -aliasingaliasing

Texture compression (OpenGL)Texture compression (OpenGL)? ? DXTC/S3TC DXTC/S3TC –

–PrePre- -compressed (DDS file)compressed (DDS file)CCCompressed by driverd b d id b d iKarl

Hillesland - NVIDIA Hardware - 11/2 - Slide 33– –Compressed by driver? ? DXT1/S3TC, DXT3,

DXT5 (not DXT2, DXT4)DXT1/S3TC, DXT3, DXT5 (not DXT2, DXT4)? ? Ugly (be careful of trickery

though)Ugly (be careful of trickery though)

文库 / 手机文库 /

2024年3月14日发(作者:笃绿凝)

文库 / 手机文库 /

nVIDIA Hardware:NVIDIA硬件

NVIDIA HardwareNVIDIA HardwareKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 1Karl

HilleslandKarl HilleslandNovember 2, 2000November 2, 2000

Cards discussedCards discussed? ? Major release in fall, improvement in springMajor release

in fall, improvement in spring? ? NV10: GeForce 256 (Fall 1999)NV10: GeForce 256 (Fall

1999)NV15 G FNV15 G F? ? NV15: GeForce2 GTS (Spring 2000) NV15: GeForce2 GTS (Spring 2000)

2 GTS (S2 GTS (Sii2000)2000)Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 2? ? NV25?:

XNV25?: X- -Box (Fall 2001) Box (Fall 2001) ? ? NV11: GeForce2 MX (Summer 2000)NV11: GeForce2

MX (Summer 2000)? ? NV16: GeForce2 Ultra (Fall 2000)NV16: GeForce2 Ultra (Fall 2000)? ?

NV20: (Anandtech: Dec 2000 NV20: (Anandtech: Dec 2000 - - April 2001)April 2001)

GeForce 256GeForce 256? ? 0.22um, 23 M transistors0.22um, 23 M transistors? ? 120 MHz core120

MHz core? ? 128 bit 166 MHz SDR or 150 MHz DDR up to 128 MB128 bit 166 MHz SDR or 150 MHz

DDR up to 128 MB128 bit, 166 MHz SDR or 150 MHz DDR, up to 128 MB 128 bit, 166 MHz SDR or

150 MHz DDR, up to 128 MB (64 MB biggest I’ve ever heard of)(64 MB biggest I’ve ever heard

of)Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 3? ? AGP 4x with fast writesAGP 4x with

fast writes? ? 350 MHz RAMDAC350 MHz RAMDAC? ? DVDDVD? ? TVTV- -outout

GeForce 256 TrianglesGeForce 256 Triangles? ? 15 MTris/s (BenMark5 gives 13M. Have seen

15 MTris/s (BenMark5 gives 13M. Have seen other references to 14.5M) other references to

14.5M) ? ? Up to 6 triangles “inUp to 6 triangles “in- -flight” at a timeflight” at a

timeKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 4? ? 2 matrix Vertex skinning2 matrix

Vertex skinning? ? Texture coordinate generation (+emboss, Texture coordinate generation

(+emboss, reflection, cube map)reflection, cube map)? ? 8 lights8 lights

BenMark5BenMark5NV10: 13 MTris/s, NV15: 24 MTris/sNV10: 13 MTris/s, NV15: 24 MTris/sKarl

Hillesland - NVIDIA Hardware - 11/2 - Slide 5

Transform EngineLightingEngineSetupEngineRenderingEngineFour Independent Pipelined

EnginesFour Independent Pipelined EnginesIndustry-leading 3D performance15-25M

triangles/secondQuadEngineTMArchitecture (from summer 99 notes)Karl Hillesland - NVIDIA

Hardware - 11/2 - Slide 6Sustained DMA, transform/clip/light, setup, rasterize and render

rateExtremely efficient>70% of the chip active at all timesUp to 6 triangles “in flight”

at a timeSuper-pipelined designVery low latency between engines

GeForce 256 pixels/texelsGeForce 256 pixels/texels? ? 4 pixel pipes, one texture each. Can

do 24 pixel pipes, one texture each. Can do 2- -texture multitexturing by coupling

pipestexturing by coupling pipestexture multi- -? ? 24/8 bit Z/stencil, 32 bit color (note:

4*(24+8+32)=256)24/8 bit Z/stencil, 32 bit color (note: 4*(24+8+32)=256)? ? Register

CombinersRegister Combinersg gKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 7? ? Texture

文库 / 手机文库 /

文库 / 手机文库 /

CompressionTexture Compression? ? 8 8- -tap anisotropic filteringtap anisotropic filtering? ?

range based fogrange based fog? ? antianti- -aliasing(?)aliasing(?)

GeForce 256 GeForce 256 - -> GeForce2 GTS> GeForce2 GTS? ? 2 textures per pipe2 textures

per pipe? ? 25M Transistors 25M Transistors 0 18 Mi0 18 Mi? ? 0.18 Micron technology0.18

Micron technologyhhllKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 8? ? 200 MHz core clock,

166 MHz DDR (“333” 200 MHz core clock, 166 MHz DDR (“333” MHz)MHz)? ? 25M Tris/s (BenMark5

gives 24M Tris/s)25M Tris/s (BenMark5 gives 24M Tris/s)? ? Flat panelFlat panel

GeForce2 GTS GeForce2 GTS ?? GeForce2 MXGeForce2 MX? ? Remove two pixel pipes (left with

2, 2 textures each)Remove two pixel pipes (left with 2, 2 textures each)? ? Dual head

supportDual head support? ? “Digital Vibrance Control”“Digital Vibrance Control”? ?

Digital Vibrance ControlDigital Vibrance ControlKarl Hillesland - NVIDIA Hardware - 11/2

- Slide 9? ? Low power and heat Low power and heat ? ? Slower Core Clock (175 MHz)Slower

Core Clock (175 MHz)? ? Either 64 or 128 bit memory possibleEither 64 or 128 bit memory

possible? ? Cheaper: (intended for ~ $100 range)Cheaper: (intended for ~ $100 range)

GeForce2 GTS GeForce2 GTS ?? GeForce2 UltraGeForce2 Ultra? ? Faster core clock: 250 MHzFaster

core clock: 250 MHz? ? Faster memory: 225 MHz DDR ( “450” MHz)Faster memory: 225 MHz DDR

( “450” MHz)EEExpensive: ~ $500ii$500$500Karl Hillesland - NVIDIA Hardware - 11/2 - Slide

10? ? Expensive: ~ $500

GeForce GeForce ?? QuadroQuadro? ? Increased clock ratesIncreased clock rates? ?

Acceleration of some common CADAcceleration of some common CAD- -oriented features (.e.g,

antifeatures (.e.g, anti- -aliased lines)aliased lines)features (.e.g, antifeatures (.e.g,

anti aliased lines)aliased lines)oriented Karl Hillesland - NVIDIA Hardware - 11/2 - Slide

11

BandwidthsBandwidths? ? AGP 4x : 1.2 GB/sAGP 4x : 1.2 GB/s? ? Video memory: 333 MHz * 128

bits = 5.3 GB/sVideo memory: 333 MHz * 128 bits = 5.3 GB/sPCI 132 MB/PCI 132 MB/? ? PCI:

132 MB/s PCI: 132 MB/s Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 12? ? Host: PC100

with SDRAM = 1.6 GB/sHost: PC100 with SDRAM = 1.6 GB/s

Vertex BandwidthVertex Bandwidth? ? Q3 Q3 - -> 18 bytes per vertex> 18 bytes per vertex–

–position 2 * 3 = 6 bytesposition 2 * 3 = 6 bytes– –texture coords, 2 textures: 2 *

2 * 2 = 8 bytestexture coords, 2 textures: 2 * 2 * 2 = 8 bytestexture coords, 2 textures:

2 2 2 8 bytestexture coords, 2 textures: 2 2 2 8 bytesKarl Hillesland - NVIDIA

Hardware - 11/2 - Slide 13– –color: 4 bytescolor: 4 bytes? ? The double eagle: 10/16 bytes

per vertexThe double eagle: 10/16 bytes per vertex– –position 2 * 3 = 6 bytesposition 2

* 3 = 6 bytes– –color: 4 bytes color: 4 bytes

Vertex Bandwidth, Q3Vertex Bandwidth, Q3? ? AGP 4x : 1.2 GB/s / 18 = 67 M Verts/sAGP 4x :

1.2 GB/s / 18 = 67 M Verts/s? ? Video memory: 5.3 GB/s / 18 = 294 M Verts/sVideo memory:

5.3 GB/s / 18 = 294 M Verts/sPCI 132 MB/ / 18PCI 132 MB/ / 18? ? PCI: 132 MB/s / 18 = 7.3

M Verts/sPCI: 132 MB/s / 18 = 7.3 M Verts/s7 3 M V7 3 M V//Karl Hillesland - NVIDIA Hardware

文库 / 手机文库 /

文库 / 手机文库 /

- 11/2 - Slide 14? ? Host: PC100 with SDRAM: 1.6 GB/s / 18 = Host: PC100 with SDRAM: 1.6

GB/s / 18 = 88 M Verts/s88 M Verts/s

Add indicesAdd indices? ? Assume “perfect strips” (one new vertex for each Assume “perfect

strips” (one new vertex for each triangle)triangle)? ? Each triangleEach triangle - -> 3

indices, 1 new vertex> 3 indices, 1 new vertexEach triangle Each triangle > 3 indices, 1

new vertex> 3 indices, 1 new vertexKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 15? ?

18 + 2 bytes/index * 3 indicies/tri = 20 bytes/tri18 + 2 bytes/index * 3 indicies/tri = 20

bytes/tri? ? indicies and verticies may come across different indicies and verticies may

come across different bussesbusses? ? Vertex cache can save some bandwidthVertex cache can

save some bandwidth

Texture CompositingTexture CompositingTextureEnvironment0TextureEnvironmentEnvironment1

1TextureTextureFetchingFetchingTex0Fragment ColorKarl Hillesland - NVIDIA Hardware - 11/2

- Slide 16SpecularColorSumSumSpecularColorFogApplicationTex1Fog Color/FactorSpecular

Color

Register CombinersRegister Combiners? ? Replaces blending of fragment, texture, fog, and

Replaces blending of fragment, texture, fog, and secondary ary colors.? ?

Provides configurable 8Provides configurable 8- -bit, signed math perProvides configurable

8Provides configurable 8 bit, signed math peroperationsoperationsbit, signed math per-

-pixelbit, signed math per pixel pixelpixel Karl Hillesland - NVIDIA Hardware - 11/2 - Slide

17? ? Cascading of register combiners for more Cascading of register combiners for more

sophisticated computations (Hardware limit on sophisticated computations (Hardware limit

on levels. Currently 2)levels. Currently 2)

Register CombinersRegister CombinersFragment ColorGeneralCombiner04 RGB InputsFog

Color/FactorSetRegister SetRegister Specular Color4 Alpha Inputs3 RGB Outputs3 Alpha

Outputs4 RGB Inputs4 RGB InputsKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 18Spare

0TextureFetchingTexture 0Texture 16 RGB InputsGeneralCombiner14 Alpha Inputs3 RGB Outputs3

Alpha OutputsFinalCombiner1 Alpha InputSpecular Color

Input/Output mappingsInput/Output mappings? ? Input mappingsInput mappings– –

InvertInvert– – NegateNegateKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 19g g– –

Bias by 1/2Bias by 1/2– – Expand by 2Expand by 2? ? Output mappings Output mappings –

– Bias by 1/2Bias by 1/2– – Scale by 1/2, 2 or 4Scale by 1/2, 2 or 4

General Combiner, RGBGeneral Combiner, RGBprimary colorsecondary colortexture 0A B + C DA

B + C D-or-ABCDinputmapinputmapinputmapRGBARGBAinput registersoutput

registersinputmapprimary colorsecondary colortexture 0Karl Hillesland - NVIDIA Hardware -

11/2 - Slide 20zeroconstant color 0constant color 1fogspare 1spare 0texture 1A B mux C DA

B-or-A ? BC ? DC D-or-not writeablecomputationsscaleandbiasnot readablezeroconstant color

0constant color 1fogspare 1spare 0texture 1

General Combiner, AlphaGeneral Combiner, Alphaprimary colorsecondary colortexture 0A B +

文库 / 手机文库 /

文库 / 手机文库 /

C DA B + C D-or-ABCDinputmapinputmapinputmapRGBARGBAinput registersoutput

registersinputmapprimary colorsecondary colortexture 0Karl Hillesland - NVIDIA Hardware -

11/2 - Slide 21zeroconstant color 0constant color 1fogspare 1spare 0texture 1A B mux C DA

BC Dnot writeablescaleandbiasnot readablezeroconstant color 0constant color 1fogspare

1spare 0texture 1

Final CombinerFinal Combinerprimary colorsecondary colortexture 0RGBAinput registersE

FEFinputmapinputmapKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 22zeroconstant color

0constant color 1fogspare 1spare 0texture 1ABCDA B + ( 1 - A) C + DGspare 0 +secondary

colorinputmapinputmapinputmapinputmapinputmapfragment RGB outfragment Alpha outG

X X- -Box (Abrash on Dr. Dobbs)Box (Abrash on Dr. Dobbs)? ? Intel PIII/733 with 238 KB

cacheIntel PIII/733 with 238 KB cache? ? 250250- -300 MHz Core300 MHz CoreDVD h d di kDVD

h d di k? ? DVD, hard diskDVD, hard diskKarl Hillesland - NVIDIA Hardware - 11/2 - Slide

23? ? custom sound with 64 3Dcustom sound with 64 3D- -audio channelsaudio channels

X X- -Box Transform/lightingBox Transform/lighting? ? 125 M Tris gouraud, transformed,

shaded, two textures. 125 M Tris gouraud, transformed, shaded, two textures. ? ? +one

infinite light, 62.45 MTris/sec, +one infinite light, 62.45 MTris/sec, ? ? 8 local lights

8 MTris/sec8 local lights 8 MTris/sec8 local lights 8 MTris/sec8 local lights 8 MTris/secKarl

Hillesland - NVIDIA Hardware - 11/2 - Slide 24? ? 125 M particles/s (single color front125

M particles/s (single color front- -facing squares)facing squares)? ? Vertex ProgramsVertex

Programs? ? Surface engine “works with CPU” for CatmullSurface engine “works with CPU”

for Catmull- -Clark, Bezier, Loop, and uniform BBezier, Loop, and uniform B- -splines at

50Mtris/secClark, splines at 50Mtris/sec

Vertex ProgramsVertex Programs? ? Replaces transformation and lightingReplaces

transformation and lighting? ? Custom vertex lightingCustom vertex lightingCCCustom

skinning and blendingkikiiid bld bldidiKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 25? ?

Custom skinning and blending? ? Custom texture coordinate generationCustom texture

coordinate generation? ? Custom matrix operationsCustom matrix operations? ? Custom vertex

computations of your choiceCustom vertex computations of your choice

Vertex ProgramsVertex Programs? ? Input is untransformed, unlit vertexInput is untransformed,

unlit vertex? ? Create a transformed vertexCreate a transformed vertexO iO iOptionally

computellllKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 26? ? Optionally compute–

–lightinglighting– –texture coordinatestexture coordinates– –fog coordinatesfog

coordinates– –point sizespoint sizes

Vertex Programs Programs cont.? ? Does 4Does 4- -vector fixed point mathvector

fixed point math? ? 17 Instructions:17 Instructions:ARL MOV MUL ADD MAD RCP RSQARL MOV MUL

ADD MAD RCP RSQ– –ARL, MOV, MUL, ADD, MAD, RCP, RSQ, ARL, MOV, MUL, ADD, MAD, RCP, RSQ,

DP3, DP4, DST, MIN, MAX, SLT, SGE, EXP, DP3, DP4, DST, MIN, MAX, SLT, SGE, EXP, LOG, LITLOG,

LITKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 27

文库 / 手机文库 /

文库 / 手机文库 /

Vertex Program RegistersVertex Program Registers16x4 Vertex Attribute Registers96x4 Program

Parameters(e.g, modelview (e g,odeprojection matrix)Karl Hillesland - NVIDIA Hardware - 11/2

- Slide 28Vertex Program128 instructions15x4 Vertex Result Registerse12x4 Temporary

registers

Using Vertex Programs (OpenGL)Using Vertex Programs (OpenGL)? ? Programs are arrays of

GLubytes(“strings”)Programs are arrays of GLubytes(“strings”)? ? Created/managed

similar to texture objectsCreated/managed similar to texture objectsNNNo penalty for

switching in and out of vertex program modeprogram modellffi hii hiiiddffKarl Hillesland

- NVIDIA Hardware - 11/2 - Slide 29? ? No penalty for switching in and out of vertex ? ?

execution time ~proportional to length of programexecution time ~proportional to length of

program

X X- -Box memory bandwidthBox memory bandwidth? ? UMA with GPU in controlUMA with GPU in

control? ? 64 MB, 128 bit, 200 MHz DDR RAM64 MB, 128 bit, 200 MHz DDR RAM1 GPi /1 GPi /? ?

1 GPix/sec fill rate + “occlusion circuitry”1 GPix/sec fill rate + “occlusion

circuitry”fillfill““lliiiiii””Karl Hillesland - NVIDIA Hardware - 11/2 - Slide 30? ?

“automatic z compression”“automatic z compression”

X X- -Box bandwidth diagramBox bandwidth diagramKarl Hillesland - NVIDIA Hardware - 11/2

- Slide 31

X X- -Box TexturesBox Textures? ? 4 textures per pixel (but takes two clocks for >2)4 textures

per pixel (but takes two clocks for >2)? ? One texture can be used as lookup to next textureOne

texture can be used as lookup to next texture888 general register combiners + final

combinerlliibibififillbibiKarl Hillesland - NVIDIA Hardware - 11/2 - Slide 32? ? 8 general

register combiners + final combiner? ? 3D Textures3D Textures? ? Cube maps, compression,

maps, compression, etc.? ? 2 or 4 sample anti2 or 4 sample anti- -aliasingaliasing

Texture compression (OpenGL)Texture compression (OpenGL)? ? DXTC/S3TC DXTC/S3TC –

–PrePre- -compressed (DDS file)compressed (DDS file)CCCompressed by driverd b d id b d iKarl

Hillesland - NVIDIA Hardware - 11/2 - Slide 33– –Compressed by driver? ? DXT1/S3TC, DXT3,

DXT5 (not DXT2, DXT4)DXT1/S3TC, DXT3, DXT5 (not DXT2, DXT4)? ? Ugly (be careful of trickery

though)Ugly (be careful of trickery though)

文库 / 手机文库 /

与本文相关的文章

发布评论

评论列表 (0)

  1. 暂无评论