Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | elijah-preston |
View: | 217 times |
Download: | 2 times |
HW-Accelerated HD video HW-Accelerated HD video playback underplayback under
LinuxLinux
Zou Nan haiZou Nan haiOpen Source Technology Open Source Technology CenterCenter
2
3D
EU Kernel
Media EngineMedia Engine
URB
Media (Video Front End)
Command Streamer
Thread Spawner
Thread Dispatcher
Indirect data
Th
read
p
aylo
ad
Video memory
Data port
Sampler
3
Mode of operationMode of operation
Coded
data
Output pixelMCIDCT
VLD IS
IQ
VFE or
host
EU Kernels
4
Current XVMC implementationCurrent XVMC implementation
coded data
Output pixelMCIDCT
VLD IS
IQ
Host Softwar
eper slice data
per macrobloc
k data
EU Kernels
5
XVMCXVMC
XVMC lib
Media Application
DRI interface
X Server
Graphic Hardware
render , sync, resource management
mpeg stream
decode slice of macro blocks
media commands, video memory management
6
Video Memory LayoutVideo Memory Layout
command stream
VFE state
Interface descriptors
media surface
EU kernel Instruction
media object
command
selected interface
media pointer
command
media surface
media surface
surface state
surface state
surface state
binding tables
flush command
7
Execute Unit introductionExecute Unit introduction SIMD code (variable execute size up to 16) SIMD code (variable execute size up to 16)
with prediction and control mask.with prediction and control mask. Float and integer data typeFloat and integer data type Region based direct and indirect register Region based direct and indirect register
addressingaddressing Support scalar and immediate source Support scalar and immediate source
operandoperand
8
EU RegistersEU Registers GRF (General Register File)GRF (General Register File)
– 256 bits per register (g0, g1, g2, gxx)256 bits per register (g0, g1, g2, gxx)
MRF (Message Register File)MRF (Message Register File)– 256 bits per register (m0, m1, m2, mx), write only,256 bits per register (m0, m1, m2, mx), write only,
– Used to pass payload from thread to shared Used to pass payload from thread to shared function unit.function unit.
ARF (Architecture Register File)ARF (Architecture Register File)– e.g null, ip and flag registere.g null, ip and flag register
Immediate Immediate – encoded in instructionencoded in instruction
9
Register RegionRegister Region
6 5 012347
14
13
8910
11
12
15
g0 (256 bits)
Width=8
VertStride=16
HorzStride=2
Type=w
g5.2<16,8,2>w
123456789 0101112131415
g15.3<16,16,1>UB
origin regnum=5, subregnum=2
Regnum.Subregnum<VertStride,Width, HorzStride>Type
012
1
2
0
10
Data operationData operation
W Z Y X X X X Xregister 0
register 1
register 2
register 3
W Z Y X
W Z Y X
W Z Y X
Y Y Y Y
Z Z Z Z
W W W W
Array of structure
( vertex shader)
Structure of array
( pixel shader and media code)
vecto
r
vector
11
Instruction sampleInstruction sample
(f0) add.sat(16) g28.0<2>ub g3.0<16, 16, 1>f g10.0<16, 16, 1>w {align1}
execute size type
register number
subregister number
VertStride
HorizStrideWidth Access mode
prediction register
12
Instruction setInstruction set Normal SIMD instructionsNormal SIMD instructions
– add, mul, avg, mov etcadd, mul, avg, mov etc
– dp3, dp4 etcdp3, dp4 etc
Branch control instructionsBranch control instructions– If,else, do, while, jmpi etcIf,else, do, while, jmpi etc
– branch is needed in media codebranch is needed in media code
Send instructionsSend instructions– communicate with shared function unitscommunicate with shared function units
– media kernel use it to control thread life cycle, read and media kernel use it to control thread life cycle, read and write into surfacewrite into surface
13
Instruction exampleInstruction example
add.sat(16) g28.0<2>UB g3.0<16, 16, 1>f g10.0<16, 16, 1>W {align1}
X X X X X X XX X X X X X X XX
Y Y Y Y Y Y Y Y
+ + ++ + + ++ + + ++
Y Y Y Y Y Y Y Y
+ + ++
Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Zg28
g3
g4
g10
14
An example Input and outputAn example Input and outputpayload register passed from inline data, x, y, mv, field flags etc
input Y0-Y3
input U
input V
reference Y
reference U
reference V
tmp registers
Result registers, organized in YUV420 format
Indirect data payload
media read from reference surface
media write to destination surface
constant data
15
Planar data vs Packed dataPlanar data vs Packed data Easy to handle by media kernelEasy to handle by media kernel Hard to apply some filtersHard to apply some filters Can not be directly used as a Can not be directly used as a
sampler source in hardware sampler source in hardware implementationimplementation
16
Work flowWork flow
B
DCT Data
I
kernel
P P
forward reference
frame
backward reference
frame
kernel
kernel
I P
Indirect data
inline data
Media read message
Media write message
Destination surface
slice of macroblocks
17
About XvMC APIAbout XvMC API Post processing missing in XvMC API Post processing missing in XvMC API
designdesign
Video output mixer.Video output mixer.
18
High Level LanguageHigh Level Language Why a high level language for media Why a high level language for media
kernel is preferred ?kernel is preferred ?– Easy to debugEasy to debug– Easy to reuse codeEasy to reuse code– Hide platform details, easy to understand and Hide platform details, easy to understand and
maintainmaintain
Possible choicePossible choice– GLSL is not OKGLSL is not OK– Simple C extension ?Simple C extension ?
19
H.264H.264 Kernels became much more complex Kernels became much more complex
because of difference MC and DCT because of difference MC and DCT size combination. size combination.
Not suitable on slice level API, Not suitable on slice level API, because of intra prediction.because of intra prediction.
Need schedule and dependency Need schedule and dependency control ability for media threads control ability for media threads because of intra predictionbecause of intra prediction
20
VAAPIVAAPI picture level API picture level API cover mpeg2 h264 vc1 from different cover mpeg2 h264 vc1 from different
entry pointsentry points post processing and video output post processing and video output
mixer is missingmixer is missing
21
TODOTODO IDCT code optimizeIDCT code optimize Mpeg2 XVMC VLD extensionMpeg2 XVMC VLD extension VAAPI for mpeg2VAAPI for mpeg2 VAAPI for AVCVAAPI for AVC Video post processing and mixerVideo post processing and mixer
22
Q&AQ&AThank You!Thank You!