Video Description inMore than one Language
• Overview• Need• Capability• Approaches• Conclusion
Disclaimer: This presentation does not contain any recommendations, assessments or positions from or by NAB
Video Description
The term ‘video description’ means the insertion of audio narrated descriptions of a television program’s key visual elements into natural pauses between the program’s dialogue.
(S. 3304)
AKA: Descriptive Video, Visually Impaired (VI)
Video Description Insertion
Audio with Video
Description
(Complete Mix)
Descriptive insertions
Primary Audio Dialog
“no dialog” Voice under?“no dialog”“no dialog”
Pause to reflect
• Receiver makers refused to support the original Dolby design to save bits by enabling supplemental audio tracks
• So service providers must consume bits to send everything for any audio service
• Wonder if that lack of innovation is in Gary’s book… moving on…
Initial Mandates• Initial video description rules will go into effect October
2011.– Four Networks (ABC, CBS, Fox, and NBC) and top 5 national
non-broadcast networks will have to provide 50 hrs./quarter with video description1
– Broadcast stations and MVPDs with technical capability to do so generally must pass through audio containing video descriptions.
1. For the top 25 DMAs and 50k+ subscriber systems respectively
More to Come from the FCC• Reports• Rule makings• Not later than this or sooner than that• Soonest for all DMAs : 2037• Not tomorrow … but it is coming • English is assumed, but the Spanish speaking
population is growing …• So what can we do?
Digital Audio Interfaces
Ancillary Data
Control Data
Audio Subsystem
AudioSource Coding
Wor
k F
ile
Video Subsystem
VideoSource Coding
Studio/Master Control
Audio
Video
Rou
ter/
MC
Sw
itch
er
Dolby E8 ch. PCM Stream, compressed
3 Mb/s total, twisted pair or coax
Eight is enough
For one additional service (with 5.1)
-- need to replicate the path to get more than one language or type of service
Wor
k F
ile
Ancillary Data
Control Data
Audio Subsystem
AudioSource Coding& Compression
Video Subsystem
VideoSource Coding& Compression
MCSW
Studio/Master Control
Digital Audio Interface
Audio
VideoAES-3: 2 ch / PCM stream, uncompressed 1.92 Mb/s, Twisted Pair or Coax
(may also carry 8-channel Dolby E) then in HANC: 16 channels total
Uncompressed Audio Interconnects
Sixteen is enough
• For a pair of 5.1 channel services, each with an associated stereo audio descriptive video mix.
• For a 5.1 service and 5 stereo services
• So audio in several languages with VI and HI could be supported – but only one could be 5.1
Distribution
LocalStation
ATSCDigital
RemoteProduction
& PostVenues Contribution
HD or SDNetworkCenter
DTV
For MPEG links, Audio channels can be carried as program elements with PMT-based signaling
Transmission
• Digital Transmission– A large number of audio services for a single video
(MPEG-2 Transport) can be signaled and sent (depending on the number of descriptors associated with each audio)
• Analog Transmission– Second audio or video description <choose>
ATSC Transport
Virtual Channel 1
Video
Transport Component OrganizationTransport Component Organization
Audio
PSI(P)
Audio
MDTV
Virtual Channel 2
AudioAudio
Multi-Program Multiplex
Video+PCR
Audio1: CM eng
Audio2: VI eng
Audio 3: CM spa
Audio 4: VI spa
PES Streams
Mux
Mux
Program 1
Program 2
Program 3
Multi- ProgramTransport
Stream
4 each AC-3 and ISO-639 descriptors
SI Tables (PMT)
PSIP TablesEach event
different descriptors
Midplane
Encoder
ManagementPort
CPCControl Card
ASI Output Card2 ASI
Encoder Encoder Encoder
Video Encoding
Audio Encoding3 X 2.0
Video Encoding Video Encoding
StatMux Engine
StatMux Engine
Optional Audio Card
5.1 + 2.0or
3 x 2.0
Optional Audio Card
5.1 + 2.0or
3 x 2.0
Optional Audio Card
5.1 + 2.0or
3 x 2.0
Optional Audio Card
5.1 + 2.0or
3 x 2.0
Encoder
Video Encoding
SDI / HDSDI (embedded audio)
PSIP input
Dual ASIOutput
Audio Encoding3 x 2.0
Audio Encoding3 x 2.0
Audio Encoding3 x 2.0
Video Inputs
Encoder OutputsOptional Audio Cards
Each can encode or transcode from Dolby E one 5.1 + one 2.0 Dolby Digital
Note this configuration supports more than my example case.
The maximum audio + video shown isFour video programs (HD, SD or mixed) Four 5.1 surround channelsSixteen 2.0 stereo channels
Based on slide from
One configuration would be to provide one 5.1 in English with the Descriptive Video on a Dolby E path, and the
Spanish 5.1 with Descriptive Video in Spanish on another Dolby E path.
Announcement Paradigms• This Program has English, Spanish, with Video
Description in both Languages
• Separate virtual channels – English – English Description for the Blind– English for the Hearing Impaired– Spanish– Spanish Description for the Blind– Spanish for the Hearing Impaired
But Cable may have to do something like this if delivery
to NTSC sets is required
Terrestrial Emission Overview(signaling and announcement)
Event 1CM (5.1, eng)
Event 2
Event 2 – AC-3 descriptor with four audio services
PMT – four ISO-639 descriptors(one per program element)
EIT 0 (partial) EIT 0 (partial)Event 1 – AC-3 descriptor
with one audio service
Event 2 – AC-3 descriptor with four audio services
PMT – one ISO-639 descriptor
ATSC Transport
PSIP & PSI
CM (5.1, eng) + VI (2, eng)
CM ( 5.1, spa + VI (2, spa)
Events & tracks
DTV Receiver
RF Tuner& VSB De-Modulator
AudioDecoding
Tran
sport D
e-M
ultip
lex
VideoDecoding
PSIP Data
Audio
Video
Program GuideDatabase
DisplayProcessor
Program select from user
RFChannelSelect
AudioSelect
CEA-CEB-21
Recommended Practice for Selection and Presentation of DTV Audio
In progress since July 2008, but almost done
Key Issues
• User set up and control• Explicit Language selection• Explicit VI and HI selection • Differences between stream construction (Off-
air and Cable)
Key Recommendations
• Receivers should gather user preferences and allow them to be changed later
• Receivers should read the tables and descriptors and use the contents
• Receivers should automatically select best fit to preferences when more than one stream is present
Key Recommendations• Should consider the following items when
providing for user selection of their preferred audio stream:
– Stream type (CM, VI, or HI,) as signaled by the bsmod field in the AC-3_audio_stream_descriptor().
– The language field encoded in the AC-3_audio_stream_descriptor().
– The component_name_descriptor() to provide supplemental audio stream information to users, if needed.
Conclusion
• Multiple language, multiple community service audio tracks are part of your future (unless English is declared to be the <ONE> Official Language for the United States of America)
• Force fitting to the 2-audio mold is problematic• When breaking the mold, plan ahead
CreditsATSC
CEA
Mike Dolan
Graham Jones