+ All Categories
Home > Documents > Status – Week 260

Status – Week 260

Date post: 15-Jan-2016
Category:
Upload: gaurav
View: 30 times
Download: 0 times
Share this document with a friend
Description:
Status – Week 260. Victor Moya. Summary. shSim. GPU design. Future Work. Rumors and News. Imagine. shSim. Currently working: Command Processor: reads a text based trace file (programs, parameters, vertexs, commands to rasterizer). - PowerPoint PPT Presentation
Popular Tags:
29
Status – Week Status – Week 260 260 Victor Moya Victor Moya
Transcript
Page 1: Status – Week 260

Status – Week Status – Week 260260

Victor MoyaVictor Moya

Page 2: Status – Week 260

SummarySummary

shSim.shSim. GPU design.GPU design. Future Work.Future Work. Rumors and News.Rumors and News. Imagine.Imagine.

Page 3: Status – Week 260

shSimshSim

Currently working:Currently working: Command Processor: reads a text based Command Processor: reads a text based

trace file (programs, parameters, vertexs, trace file (programs, parameters, vertexs, commands to rasterizer).commands to rasterizer).

Shader: simulates a N multithreaded, variable Shader: simulates a N multithreaded, variable latency support, VS1 capable ‘vertex’ shader.latency support, VS1 capable ‘vertex’ shader.

Rasterizer: OpenGL ‘emulator’, accepts Rasterizer: OpenGL ‘emulator’, accepts resolution and clip planes changes, recieves resolution and clip planes changes, recieves ‘shaded’ vertexs from the shader (only 2 ‘shaded’ vertexs from the shader (only 2 QuadFloats, vertex positon + color), displays QuadFloats, vertex positon + color), displays the triangles in a GL window.the triangles in a GL window.

Page 4: Status – Week 260

shSimshSim

Tests: Tests: 2/4 multithread (with another 2/4 input 2/4 multithread (with another 2/4 input

buffers) single shader.buffers) single shader. Fixed 3 latency cycles. Shader to Rasterizer Fixed 3 latency cycles. Shader to Rasterizer

latency of 4. CommandProcessor to latency of 4. CommandProcessor to Rasterizer latency of 6.Rasterizer latency of 6.

Simple coordinate change traces Simple coordinate change traces (shader.input, shader.input.2).(shader.input, shader.input.2).

Ripple vertex shader example from DX8 & Ripple vertex shader example from DX8 & DX9 SDK (ripple.input):DX9 SDK (ripple.input):

Around 300 triangles (1100 vertexs).Around 300 triangles (1100 vertexs). Color is Color is calculatedcalculated from vertex position. from vertex position.

Page 5: Status – Week 260

shSimshSim

Ripple.vsh.Ripple.vsh.

Page 6: Status – Week 260

shSimshSim

Screenshots from frames rendered Screenshots from frames rendered by shSim:by shSim:

Page 7: Status – Week 260
Page 8: Status – Week 260
Page 9: Status – Week 260
Page 10: Status – Week 260
Page 11: Status – Week 260
Page 12: Status – Week 260
Page 13: Status – Week 260
Page 14: Status – Week 260
Page 15: Status – Week 260
Page 16: Status – Week 260

GPU ArchitectureGPU Architecture

Based in current GPUs:Based in current GPUs: NV30NV30 R300R300

Based in other graphic processors:Based in other graphic processors: PS3PS3 ImagineImagine

Page 17: Status – Week 260

GPU ArchitectureGPU Architecture

Based in an API:Based in an API: DX8DX8 DX9DX9 DX10DX10 OpenGL 1.4 and extensions.OpenGL 1.4 and extensions. OpenGL 2.0OpenGL 2.0

Based in an architecture model:Based in an architecture model: VectorVector ScalarScalar MultithreadedMultithreaded

Page 18: Status – Week 260

GPU SpecificationGPU Specification

Shader Model:Shader Model: Language:Language:

DX9:DX9:– VS2.0/PS2.0.VS2.0/PS2.0.– VS3.0/PS3.0.VS3.0/PS3.0.

OpenGL:OpenGL:– NV_vertex_program_2/NV_fragment_program.NV_vertex_program_2/NV_fragment_program.– ARB_vertex_program/ARB_fragment_program.ARB_vertex_program/ARB_fragment_program.

Our own language. Our own language.

Page 19: Status – Week 260

GPU SpecificationGPU Specification

Shader Architecture:Shader Architecture: Architectural model:Architectural model:

Scalar.Scalar. SIMD.SIMD. Multithreaded.Multithreaded. Vector.Vector. Out-of-order.Out-of-order.

Page 20: Status – Week 260

GPU SpecificationGPU Specification

Configuration:Configuration: Integer Unit: Integer Unit:

– Number.Number.– Precission.Precission.– SIMD or scalar?SIMD or scalar?

Float Point Unit:Float Point Unit:– Number.Number.– Precission.Precission.– SIMD or scalar?SIMD or scalar?

Page 21: Status – Week 260

GPU SpecificationGPU Specification

Memory Unit:Memory Unit:– Number.Number.– Texture modes.Texture modes.– Filtering modes.Filtering modes.

Register Banks:Register Banks:– Number.Number.– Ports.Ports.– Size.Size.– Scalar or SIMD?Scalar or SIMD?

Page 22: Status – Week 260

XBOX (NV2A) Vertex XBOX (NV2A) Vertex ShaderShader

Page 23: Status – Week 260

Future WorkFuture Work Shader:Shader:

Add branch/call/ret instructions.Add branch/call/ret instructions. Add texture instructions (Pixel Shader).Add texture instructions (Pixel Shader).

Command Processor:Command Processor: Define a trace specification: binary, gzipped?Define a trace specification: binary, gzipped? Define an interface with OpenGL (Mesa?) or Define an interface with OpenGL (Mesa?) or

DX8/DX9 (driver?).DX8/DX9 (driver?). Primitive Assembly:Primitive Assembly:

Implement vertex cache and primitive assembly Implement vertex cache and primitive assembly (only triangles?).(only triangles?).

Implement culling and clipping?Implement culling and clipping?

Page 24: Status – Week 260

Future WorkFuture Work

Deferred rendering?Deferred rendering? Transformed geometry must be stored in Transformed geometry must be stored in

video memory.video memory. Geometry must be sorted:Geometry must be sorted:

Tiles.Tiles. Front to back.Front to back.

Rasterization:Rasterization: Triangle Setup and Fragment Generation.Triangle Setup and Fragment Generation.

Any suited method: Olano & Greer, DDA?.Any suited method: Olano & Greer, DDA?. MSAA support?MSAA support?

Page 25: Status – Week 260

Future WorkFuture Work

Early Z and Hierarchical Z? Pixel Shader: Early Z and Hierarchical Z? Pixel Shader: Implement unified with vertex shaders?Implement unified with vertex shaders? Queue/buffering mechanism? Queue/buffering mechanism?

(memory/texture latency very large).(memory/texture latency very large). Pixel Shader:Pixel Shader:

Unified shader architecture?Unified shader architecture? Pixels need a lot of buffering Pixels need a lot of buffering

(memory/texture operations).(memory/texture operations). Implement a TMU simulator (filter Implement a TMU simulator (filter

algorithms, memory access, texture algorithms, memory access, texture compression, cache).compression, cache).

Page 26: Status – Week 260

Future WorkFuture Work Fixed fragment operations:Fixed fragment operations:

Implement using the shader?Implement using the shader? Fog: remove?Fog: remove? Pixel Ownership: remove?Pixel Ownership: remove? Scissor Test: implement (needed if clipping is not Scissor Test: implement (needed if clipping is not

implemented).implemented). Alpha test: same as Z Test.Alpha test: same as Z Test. Z Test and Stencil Test: must be implemented, but Z Test and Stencil Test: must be implemented, but

could be added to a generic shader unit?could be added to a generic shader unit? Blending: add to shader?Blending: add to shader? Dithering: remove.Dithering: remove. Logical Op: remove or add to shader.Logical Op: remove or add to shader. MSAA Operations: ?MSAA Operations: ?

Page 27: Status – Week 260

Future WorkFuture Work

Framebuffer:Framebuffer: Z compression.Z compression. Color compression.Color compression. SSAA or MSAA support?SSAA or MSAA support?

Page 28: Status – Week 260

News and RumorsNews and Rumors

NV30 architecture:NV30 architecture: 4x2 pixel pipes?4x2 pixel pipes? 8x zixel pipes (Z Test & Stencil only).8x zixel pipes (Z Test & Stencil only).

ATI ready to release R350 and RV350 in a ATI ready to release R350 and RV350 in a couple of weeks.couple of weeks. R350: Updated R300 core with additional R350: Updated R300 core with additional

features (?) and increased clock frequency features (?) and increased clock frequency (375 – 400 MHz).(375 – 400 MHz).

RV350: value chip based in R300 core. Maybe RV350: value chip based in R300 core. Maybe 8x1 core, 128 bits bus. Clock frequency 300 – 8x1 core, 128 bits bus. Clock frequency 300 – 400 MHz. 75 Million transistors.400 MHz. 75 Million transistors.

Page 29: Status – Week 260

ImagineImagine

‘‘Computer Graphics on a Stream Computer Graphics on a Stream Architecture’, John Douglas Owens, Architecture’, John Douglas Owens, PhD dissertation.PhD dissertation.

Not read yet either.Not read yet either.


Recommended