Post on 20-Jan-2016
description
transcript
Sony Computer EntertainmentDevelopment Conference
2nd - 3rd August 2001
Confidential Information of Sony Computer Entertainment Europe
GS Master class
Mark BreugelmansMark Breugelmans
Confidential Information of Sony Computer Entertainment Europe
• GS memory is 4meg• GS fill rate is 1.2gigapixel/sec (textured)• GS input bandwidth is 64bit
– We can stream up to 1.2gigabyte a second
• GS polygon though-put is determined by:– Set-up time (number of cycles per vertex)– Polygon size (number of pixels to draw)
What we know about the GS
Confidential Information of Sony Computer Entertainment Europe
• GS runs at 150mhz but with only a 64bit input• That’s around 24megabyte/frame (PAL) to be
shared between textures and geometry• Geometry
– Use strips for fastest geometry set-up
• Textures– Always pack 4,8,16bit textures into 32bit format
before hand for fastest transfer.
Getting data in
Confidential Information of Sony Computer Entertainment Europe
• Theoretical rate is 1.2gig/sec • Transfer rates
– 32, 24, 16bit 1200 Megabyte/sec (1065*)– 8bit 900 Megabyte/sec (799*)– 4bit 600 Megabyte/sec (383*)– (* path3 measured values)
• Sample code shows you how to convert
Texture Transfer Rates
Confidential Information of Sony Computer Entertainment Europe
• At most 8 textured pixels are drawn per cycle• Up to 8x4 that can be drawn in set-up time• The GS is not very efficient for tiny triangles
Small triangles and set-up time
0
5
10
15
20
1x1 2x2 4x4 8x4 8x8 16x16
UntexturedTexture
Confidential Information of Sony Computer Entertainment Europe
• Pixels are drawn by the GS in groups of 8• Small triangles will not make use of this
– Triangle Size Pixels Drawn/Cycle
– 1x1 0.12
– 2x2 0.5
– 4x4 2
– 8x8 5.27
– 16x16 6.13
Small triangles and Fill-rate
Confidential Information of Sony Computer Entertainment Europe
• Triangle size• Texture to pixel size• Texture filtering modes (Tri-linear, mip-maps)• Fog• Caches
– Texture page buffer– Frame/Z page buffer
Fill rate factors
Confidential Information of Sony Computer Entertainment Europe
• Frame and Z-Buffer: 8k– split into 2 buffers: 32x32x32bit = 4k each
• Page refill is very fast– 8192bits per cycle (150gigabyte/sec bandwidth!)– Whole 8k page buffer refilled in 8 cycles
Frame/Z Buffer Page caches
Z Buffer32x32
Frame 32x32
Confidential Information of Sony Computer Entertainment Europe
• Frame/Z Page cache will get filled line by line as drawing scans down– Fill rate while varying height is roughly constant– Fill rate while varying width varies with page miss
• Cache misses due for Frame/Z page don’t drop fill-rate much below 1gigapixel.
• Textures are usually more of a problem
Frame/Z Page Cache misses
Confidential Information of Sony Computer Entertainment Europe
0
200
400
600
800
1000
1200
1400
2x2 4x4 8x8 16x16 32x32 64x64 128x128 256x256
*Texture is on cache without reducing size
Fill
rat
e
Untextured
Textured*
Fill-rate vs. Triangle size
Confidential Information of Sony Computer Entertainment Europe
• As polygon counts head into millions pixel sizes shrink rapidly
• PA scans of games suggests better use of LOD would benefit some games significantly.– The back of a 5000 polygon car may result in just
50 visible pixels once projected onto the screen.– Similarly there’s no point having detailed textures
that are going to be shrunk so much
Level of detail
Confidential Information of Sony Computer Entertainment Europe
• Set all vertices to:• red=0, green=1, blue=0• alpha blend=destination + source• z test = disabled• texture = disabled
• Lighter areas show you where there is high density or overdraw
A pixel density test
Confidential Information of Sony Computer Entertainment Europe
• Texture cache: 8k
Texture Page caches
32bit64x32
16bit64x64
8bit128x64
4bit128x128
Also used for 24,8H, 4HL, 4HH
Confidential Information of Sony Computer Entertainment Europe
• 64x32 sprite, 24bit texture– Texture size Fill-rate GS cycles– 64x32 1158 262– 65x32 596 514
• One pixel outside the page halves fill rate!• Texture cache miss is based on the texture
co-ordinates not the original texture size• Crossing texture pages also affects the cache
Texture Cache misses example
Confidential Information of Sony Computer Entertainment Europe
• The blocks in the pages are zig-zagged in 1/4s, 1/16s etc for efficiency.
• Use at most 1/2 page width and height to avoid crossing 3 quarters which causes many block reloads / page misses
Crossing Texture Pagesefficiently
Crosses2 quarters
Crosses3 quarters
Confidential Information of Sony Computer Entertainment Europe
• PA scans showing GS wait for texture
• Suggested subdivision for each texture mode:– Texture mode Subdivision
– 4bit (128x128) 64x64
– 8bit (128x64) 64x32
– 16bit (64x64) 32x32
– 24/32bit (64x32) 32x16
Recommended subdivision
Not subdivided 256x256(4bit) Subdivided 256x256(4bit)
Confidential Information of Sony Computer Entertainment Europe
• Use 4bit or 8bit textures• Clamp texture to page size to keep in page
– Bilinear may fetch 1pixel outside your co-ordinate range.
• Either/Or– Keep all textures within one page– Sub-divide polygons until ST co-ordinates of each
polygon stay within a half cache page
Reducing texture cache miss
Confidential Information of Sony Computer Entertainment Europe
0
200
400
600
800
1000
1200
8x8 16x16 32x32 64x64 128x128 256x256
Texture Coordinates
Fil
l ra
te Fillrate for a4bit texture ona 16x16 sprite
Texture reduction penalty
Confidential Information of Sony Computer Entertainment Europe
• Good for avoiding texture reduction– Look better– May help reduce texture transfers for distant
drawing
• Watch out for performance on large polygons– mip-maps in different pages can cause multiple
texture cache reloads
Mip-maps
Confidential Information of Sony Computer Entertainment Europe
• Primitive is drawn line by line– Wall reloads all mipmaps for every line– Road loads each mip-map only once
Mip-maps on large primitives
4
3
2
11 2 3 4
Confidential Information of Sony Computer Entertainment Europe
• Tri-linear fill rate is 1/2 the speed of bilinear.– It’s fetching twice the number of pixels
• When two mip-map levels are in different pages Tri-linear is 8x slower than bi-linear– Due to multiple page loads per pixel
• Solutions– Keep smaller mip-maps in same page– Disable tri-linear for near mipmap levels– Perhaps do tri-linear as 2 pass with alpha
Tri-linear performance
Confidential Information of Sony Computer Entertainment Europe
0
200
400
600
800
1000
1200
*Texture is on cache without reducing size
Fill
rat
e
Textured*
Texture*+Fog
Fill-rate and Fog
Confidential Information of Sony Computer Entertainment Europe
• For larger textured primitives it is quicker to do fog as a second pass
• Technique– 1st pass draw a textured primitive– 2nd pass gouraud and alpha blended primitive
Alternative FOG
Confidential Information of Sony Computer Entertainment Europe
25 25 2
• Early Pixel reject– Pixels discarded in lines – Eliminates all page misses and texture loads– Speed depends on location of triangle
Scissoring
52 52 34
79 1135 34
12 12 2
25 26 18
36 280 18
4 4 2
7 7 6
9 12 6
16x16 triangle 64x64 triangle 128x128 triangleNote: All Timings in GS cycles
Confidential Information of Sony Computer Entertainment Europe
• TEX0_1 only takes 2 GS cycles if CLUT isn’t loaded and texture address isn’t changed
• TEX2_1 (CLUT) is no quicker than TEX0_1 it just masks some of the TEX0_1 fields
Context changes with TEX0_1
Confidential Information of Sony Computer Entertainment Europe
• Loading a new CLUT causes 2 things to happen– New CLUT must be loaded– Texture cache is invalidated
• Loading a just a CLUT is no faster than loading both CLUT and TEXTURE
• However selecting an already loaded CLUT is a zero cost operation.
CLUTs
Confidential Information of Sony Computer Entertainment Europe
• Texture page caches have the biggest effect on fill rate– Subdivide large texture co-ordinate ranges– Keep mip-maps in the same page
• Texture reduction also costs fill rate as texel read becomes bottle neck
• Frame buffer pages misses aren’t too bad– Cost for big polygons is not bad compared to texture
penalties
Fill-rates : Summary
Confidential Information of Sony Computer Entertainment Europe
• 4bit, 8bit palletised are the most compact• Tiled textures with repeat and region repeat• Multi-pass techniques
– Alpha blending is zero cost• Useful for multi-pass techniques
– Useful blend types• Standard blend between SRC and FRAME• Multiply blend (using alpha channel)
Making the most of VRAM
Confidential Information of Sony Computer Entertainment Europe
• Very easy way to add detail for little cost• Repeat range
– 0.10.4 UV (0 - 1024)– 1.11.4 ST (+- 2048) which is 4x the range– Number of repeats reduces for larger textures
• Watch out when scissoring massively tiled polygons– Perspective errors– Recalculate smaller texture co-ordinates
Tiling textures
Confidential Information of Sony Computer Entertainment Europe
• Monochrome textures can compress really well to 4bit
Texture Compression
Confidential Information of Sony Computer Entertainment Europe
• The eye is sensitive to gradual changes in luminance so palettes bad look in this case
• In this case it would be better to reduce in size and use GS bilinear filter to interpolate
Texture Compression
Confidential Information of Sony Computer Entertainment Europe
• You can add a low bit depth detail map to a low resolution interpolated image
• Total size of the 2 images is much less than a single 24bit image. We can also use tiling.
Texture Compression
Confidential Information of Sony Computer Entertainment Europe
2 Pass Texture Compression
Colour map 1/16 area of original.8-bit CLUT up to 32-bit
Detail mapfull-size
2-bit or 4-bit grayscale
Original 24-bit or 32-bit image
Confidential Information of Sony Computer Entertainment Europe
• Detail map CLUT is concentrated around the centre– Eye is sensitive
to small changes in luminance.
Texture Compression
1.0
0.0
2.0
• Detail map is calculated as:– original pixel / colour map pixel = alpha multiply which is then mapped to a
CLUT.
Confidential Information of Sony Computer Entertainment Europe
2-bit Luminance Textures
4-bit image
x x 0 1
x x 1 0
x x 1 1
0 0x x
CLUT 1
0 1 x x
1 0 x x
1 1 x x
00 x x
CLUT 2
Confidential Information of Sony Computer Entertainment Europe
• Decompressing the texture– Draw low resolution colour map normally– Draw detail map with alpha multiply
• Two alternatives for detail map drawing• Decompress to a new texture first
• Draw directly using two passes
• Colour map can serve as a low-res mipmap– Detail map can be faded in for close ups
• Benefit is reduced GIF->GS data transfer
Texture Compression
Confidential Information of Sony Computer Entertainment Europe
• For high-resolution you need to run the TV interlaced– Odd and Even lines are drawn alternate frames– Any image not drawn on both lines flickers
• Scan line blending solves the problem• This flickering is much more more of a
problem than edge aliasing.
Interlace Flickering
Confidential Information of Sony Computer Entertainment Europe
• Choose appropriate mip-map textures• For games not guaranteed to run in a frame
– Use 2 circuit method (very easy)
• If you can run in a frame you can save some VRAM compared to the 2circuit method– Sprite method: Saves 1/2 a display buffer– Motion blur method: Save all VRAM– 2pass method: Save all VRAM but 2x polygons
Interlace Flickering - Solutions
Confidential Information of Sony Computer Entertainment Europe
• Edge anti-aliasing is nice but you must sort your polygons and it’s slower to draw
• Down sample is easy but expensive in VRAM– Draw objects to large off-screen buffers and down-
sample (we can still Z test if we scale up Z first)
• An alternative method– Render 4x with 25% alpha and 1/2 pixel offset in 4
directions. Same effect using extra polygons rather than VRAM
Super-sampling techniques and edge Anti-aliasing
Confidential Information of Sony Computer Entertainment Europe
• Framing out on loading– Use field mode perhaps– You could use 16bit field mode in the z buffer?– Use a low res background with 2nd circuit text?
One last thing - Loading screens and framing out
Confidential Information of Sony Computer Entertainment Europe
• Maximising GS input paths– Transfer textures as 32bit– Consider detail textures and texture tiling
• Keeping up fill-rates– Subdivide textures to within caches– Don’t reduce textures– Make use of LOD to avoid <1pixel area triangles– Watch out for penalties on Fog and Mip-maps
Summary