Date post: | 23-Dec-2015 |
Category: |
Documents |
Upload: | stephen-charles |
View: | 221 times |
Download: | 1 times |
COOL Chips IV
A High Performance 3D Graphics Rasterizer with Effective Memory Structure
Woo-Chan Park, Kil-Whan Lee*, Seung-Gi Lee, Moon-Hee Choi,
Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung,
Il-San Kim, Won-Ho Chun, Won-Suk Kim, Tack-Don Han,
Moon-Key Lee, Sung-Bong Yang, and Shin-Dug Kim
Media System Lab.Yonsei University
Seoul, KoreaE-mail : [email protected]
COOL Chips IV
- - 22 - -
Outline
IntroductionDavid SimulatorHigh Performance 3D Graphics Rasterizer with
Effective Memory Structure (David Rasterizer)Performance AnalysisConclusions
COOL Chips IV
Introduction
COOL Chips IV
5th Year
1st Year
2nd Year
3rd Year∼Yearly Research Plan
Study Simulation Environment 3D GA Simulator development Survey of Related Works
Propose an Effective Architectur
e Performance Evaluation IP Co-Development
High Performance Architecture Building international core IP Implementation of Prototype System Parallel Rendering Architecture
Basic Environment & ResearchBasic Environment & Research
Technology Prevalent 3D Graphic Accelerator
Technology Prevalent 3D Graphic Accelerator
High Performance 3D Graphic Accelerator
High Performance 3D Graphic Accelerator
NRL Project
PropertiesProperties
Tech
nology P
revalent
(15million
textured
polygon
s)T
echn
ology Prevalen
t(15m
illion textu
red p
olygons)
High
Perform
ance
(25million
textu
red p
olygons)
High
Perform
ance
(25million
textu
red p
olygons)
• Institution for bringing up an excellent lab. with a core technology• A Government-initiated Project
NecessitiesNecessities
• Leadership of an advanced research area• Synergy effect through research interchanges
TitleTitle
The Design of High Performance 3D Graphics Accelerator for Realistic Image
COOL Chips IV
- - 55 - -
Architecture
Research
Architecture
Research
The Design of A High Performance 3D Graphics Acceleratorfor Realistic Image & Building Core IPs
The Design of A High Performance 3D Graphics Acceleratorfor Realistic Image & Building Core IPs
• Geometry Processing Unit, Rendering Unit
• Realization Mapping Unit
• P-M Architecture, Memory Architecture for 3D GA
• Cache & Memory Hierarchy
• Parallel 3D Rendering System
• Geometry Processing Unit, Rendering Unit
• Realization Mapping Unit
• P-M Architecture, Memory Architecture for 3D GA
• Cache & Memory Hierarchy
• Parallel 3D Rendering System
Technology Prevalent3D Graphics Accelerator
High Performance3D Graphics Accelerator
Design
Research
Design
Research
• Execution Model(VLIW, SIMD, RISC etc.), Control & Interface
• Appliance to other system library by implementing VHDL
• Execution Model(VLIW, SIMD, RISC etc.), Control & Interface
• Appliance to other system library by implementing VHDL
SW
Research
SW
Research
• API & Rendering Algorithm
• Geometry Compression / Modeling
• API & Rendering Algorithm
• Geometry Compression / Modeling
Verification
/Integration
Verification
/Integration
• Construction of Simulation Environment & Verification by Simulation
• Prototype System
• Construction of Simulation Environment & Verification by Simulation
• Prototype System
COOL Chips IV
- - 66 - -
Current Research Work
Related Works & Basic ResearchRelated Works & Basic Research
Texture cache simulation for various cache architectures Texture Cache Sharing Memory bandwidth saving scheme for texture data
Texture cache simulation for various cache architectures Texture Cache Sharing Memory bandwidth saving scheme for texture data
Perspective texture mapping Efficient bump mapping Modified anti-aliasing execution model
Perspective texture mapping Efficient bump mapping Modified anti-aliasing execution model
Object-oriented rendering using the analytic model Effective reuse buffer for triangle mesh Order-independent transparency
Object-oriented rendering using the analytic model Effective reuse buffer for triangle mesh Order-independent transparency
Overlapped light geometry processing(OLGP) New method for topological compression
Overlapped light geometry processing(OLGP) New method for topological compression
Geometry Unit
Rendering Unit
Realization Unit
Memory Unit
High performance floating point adder/subtractor High performance floating point multiplier High performance floating point divider
High performance floating point adder/subtractor High performance floating point multiplier High performance floating point divider
Arithmetic Unit
Major ResearchMajor Research David SimulatorDavid Simulator
Vertex data(x,y,z,w)
Model-view Transform
( Lighting )
Clipping
Viewport Transform
Triangle Setup
Projection
Divide by w
TextureMapping
Z-buffering
Scan-conversion
Bump map. Perspective
Fog / Alpha blending
Anti-aliasing
Pixel data
COOL Chips IV
David Simulator
COOL Chips IV
- - 88 - -David Simulator Simulation Work Flow
Model DataModel Data
ParserParser
Mesa Library CallMesa Library Call Geometry Simulator CallGeometry Simulator Call
FPU Instructions
Ver
tex
Buf
fer
Performance Result
Rasterizer
Inst. & Data
transfer & Store to Local memory
Inst. & Data
transfer & Store to Local memory In
stru
ctio
n F
etch
In
stru
ctio
n F
etch
Dec
ode
Dec
ode
Exe
cute
#1
Exe
cute
#1
Wri
te B
ack
Wri
te B
ack
Exe
cute
#2
Exe
cute
#2
Exe
cute
#3
Exe
cute
#3
Write data to local
memory
Write data to local
memory
Geometry Engine
OpenGL format
Setup pipelineSetup pipeline
Edge work pipelineEdge work pipeline
Span processingSpan processing
MC for frame buffer accessMC for frame buffer access
Texture cacheTexture cache
Mapping unit (texture, bump, environment,
displacement)
Mapping unit (texture, bump, environment,
displacement)
MC for image map access
MC for image map access
Z CompareZ Compare Color BlendColor Blend
Rasterizer Simulator CallRasterizer Simulator Call
Obj format
Performance Result
COOL Chips IV
- - 99 - -
David Rasterizer
Block Diagram
Interpolation
Lambda Calculation
Lambda Calculation
Setup
Edge Walk
Z-BufferCompare
Color Blend
Frame Buffer/ Texture Memory / Bump Map / Environment Map
AddressZ-Buffer
AddressFrame buffer
Gradient value ofU and V, W, D
X, Y
R, G, B, αZ
Lambda Calculation
Address Calculation
Tags
Texel Addresses, BumXel Addresses, EXel Address
RequestFIFO
ReorderBuffer
Texture Cache
Miss Addresses
FragmentFIFO
Cache Addresses
CacheAddresses
Pixel Cache
Span Processing
Bump Engine
Interpolation
Memory Controller
128bits24bits
Env. Engine
Texels, BumXel, EXel
Address Data
Z-test pipeline Mapping pipeline
COOL Chips IV
High Performance 3D Graphics Rasterizer with Effective Memory Str
ucture (David Rasterizer)
COOL Chips IV
- - 1111 - -
Architectural features of David rasterizer
Performing z-test pipeline before TBE(Texture, Bump, and Environment) mapping completion Saving memory bandwidth Solve the incosistency problem with tagging schem
e for pixel cache Texture cache sharing with BE(Bump and Envi
ronment) mapping Efficient structure
COOL Chips IV
- - 1212 - -
Rasterizer model : Neon, S3
Texture cache
Texture read / filter
Texture blend
Pixel information
memory
Z read
Z test
Z write
Alpha test
Destination read
Alpha blend
Destination write
memory
Texture cache
Z read
Z test
Z write
Texture read / filter
Texture blend
Alpha test
Destination read
Alpha blend
Destination write
Pixel information
Neon S3
COOL Chips IV
- - 1313 - -
David Rasterizer
memory
Texture cache
Tag test and Z read
Z test
Texture read / filter
Texture blend
Alpha test
Destination read
Alpha blend
Destination write
Pixel information
Pixel cache
Pixel cache
Tag update and Z write
wid
e se
par
ate
COOL Chips IV
- - 1414 - -
Architecture Comparison Neon
When is texture mapping performed?
before Z test
OpenGL semantics for perfectly transparent texture
Support
S3
after Z test
Not support
David
after Z test
Support
Advantages• Support OpenGL semantics
• No wasting bandwidth No fetching texture data that are obscured
• Simple scheme• No wasting bandwidth No fetching texture data that are obscured• Support OpenGL semantics
Disadvantages • Wasting bandwidth• Unable to support OpenGL semantics
• Wide separation Inconsistency problemSolve it using additional flag bits in a pixel cache
COOL Chips IV
- - 1515 - -
Texture Cache Sharing
Texture Mapping #2Texture Mapping #1 Cache DRAM
Bump Mapping
Environment Mapping
Current 3D Architecture
Texture Mapping #1
DRAM
David 3D Architecture
Shared H/W
Bump Mapping
Texture Mapping #2
Shared H/W
Environment Mapping
SharedCache
DRAM
DRAM
Current Architecture David Architecture
Mapping Hardware
Cache Size
# of Read Port in Cache
Independent H/WShared H/W
Reduce H/W cost (about 30%)
Same Same
2 2
Throughput 1 CycleTexture Mapping : 1 Cycle
BE Mapping : 2 Cycles (infrequent operations)
Features BE Mapping : DRAM AccessBE Mapping : Cache Access
Remove Pipeline Stalls due to DRAM Access
COOL Chips IV
Performance Analysis
COOL Chips IV
- - 1717 - -
Z Depth Complexity, # of Z Test Fails
Environments for Performance Evaluation
Model DataModel Data
Mesa Library CallMesa Library Call
Miss Ratio, Performance
Rasterizer
OpenGL format
Setup pipelineSetup pipeline
Edge work pipelineEdge work pipeline
Span processingSpan processing
MC for frame buffer accessMC for frame buffer access
Texture cacheTexture cache
Mapping unit (texture, bump, environment,
displacement)
Mapping unit (texture, bump, environment,
displacement)
MC for image map access
MC for image map access
Z CompareZ Compare Color BlendColor Blend
Rasterizer Simulator CallRasterizer Simulator Call
Trace GenerationTrace Generation
Texture Cache SimulatorTexture Cache Simulator
Pixel Cache SimulatorPixel Cache Simulator
COOL Chips IV
- - 1818 - -
Model Data
Blocks Flarge
LightscapeProCDRS
Crystal Space(Game engine)
SPECviewperf
COOL Chips IV
- - 1919 - -Bandwidth Saving in Texture Data(ProCDRS)
0
5
10
15
20
25
Frame Number
Dep
th C
ompl
exity
, % B
andw
idth
Sav
ing
Depth Complexity Bandwidth Saving
Depth Complexity Bandwidth SavingAverage 2.22 21.55%
COOL Chips IV
- - 2020 - -Bandwidth Saving in Texture Data(Light)
0369
121518212427303336394245
Frame Number
Dep
th C
ompl
exit
y, %
Ban
dwid
th S
avin
g
Depth Complexity Bandwidth Saving
Depth Complexity Bandwidth SavingAverage 2.31 29.16%
COOL Chips IV
- - 2121 - -Bandwidth Saving in Texture Data(Blocks)
0
1
2
3
4
5
6
7
8
9
Frame Number
Dep
th C
ompl
exity
, % B
andw
idth
Sav
ing
Depth Complexity Bandwidth Saving
Depth Complexity Bandwidth SavingAverage 1.43 4.76%
COOL Chips IV
- - 2222 - -
Bandwidth Saving in Texture Data(Flarge)
0
5
10
15
20
25
Frame Number
Dep
th C
ompl
exity
, % B
andw
idth
Sav
ing
Depth Complexity Bandwidth Saving
Depth Complexity Bandwidth SavingAverage 1.31 4.93%
COOL Chips IV
Conclusions
COOL Chips IV
- - 2424 - -
ConclusionsSimulation Environment (David Simulator)
Evaluation for 3D graphics accelerator architecture
Performance comparisonDavid Architecture
Performing z-test before TBE mapping completion
• 5%~29% bandwidth savings for texture data in Scenes with 1~3 depth complexity
• As the depth complexity grows, the amount of bandwidth savings become large
– Recently, a 3D graphic application shows high depth complexity
Texture cache sharing with BE mapping• Hardware saving from sharing and hardware reduction for BE
mappings