Surround sound audio is getting more elaborate as it moves beyond 5.1 six channel mixes. 7.1 mixes add left and right rear channels, and mixes greater than this, such as 9.1 or 11.1, add height information into the audio surround sound mix.
The addition of height information into the mix is commonly referred to as 3D audio. The focus of this discussion will be moving to a 5.1 surround sound mix from a current stereo audio system.
The move from a workflow that handles stereo audio to one that handles surround sound presents challenges. There are of course more audio channels to contend with, and a need to make sure that these additional channels can pass through a system.
However, there are additional items such as surround sound monitoring, using audio bit rate reduction to fit the additional channels, the use of audio metadata when using bit rate reduction technologies, maintaining audio quality and the video to audio timing relationship (lip sync) and the additional advanced audio processing that may be required.
Upmixing is the process of converting stereo to a surround sound mix. This can be compared to the up conversion of SD video (Standard Definition) to HD (High Definition). In fact, the transition from SD to HD video typically drives the transition from stereo to surround sound audio mixes.
Downmixing converts surround sound to a stereo mix. This can be compared to the downconversion of High Definition (HD) video to Standard Definition (SD) . Audio downmixing and video downconversion are important to support current and legacy SD and stereo audio systems as HD and surround sound roll out.
As well, the up and downmix process are also referred to as matrix decoding and matrix encoding. Matrix techniques use phase and intensity differences to carry surround signaling in a stereo signal.
For example, Dolby ProLogic II and DTS Neural Surround use this technique (and are compatible with each other). Stereo audio that carries this “matrixed” audio information is known as LtRt (Left Total, Right Total). When a stereo audio signal does not contain matrix-ed information, it is known as LoRo (Left Only, Right Only). Stereo audio is also referred to as 2.0.
This audio matrixed technique should not be confused with audio processing techniques such as bit rate reduction (or compression) using Dolby E, Dolby Digital, Dolby Digital PlusTM or DTS technologies.
The bit rate-reduced audio that is carried in the space of a stereo audio channel cannot be monitored unless it is uncompressed (or decoded), whereas, LtRt can be monitored as a stereo audio signal. And, audio “compression and limiting” processing used to affect the dynamic range of the audio signal should not be confused with bit rate reduction compression.
Surround sound audio is referred to as L/R/C/LFE/Ls/Rs (Left, Right, Centre, Low Frequency Effects, Left Surround, Right Surround) and 5.1.
Article continues on next page ...
Up and Downmixing
When creating a surround sound mix, a stereo mix can be created at the same time. The surround sound downmix can be done by the audio mixer or by a downmix processor.
If a stereo mix is not provided, it may be needed further downstream in the audio workflow for monitoring or for a legacy stereo emission. Dolby decoders provide a LtRt downmix, and DTS Neural Surround DownMix can be used instead of Dolby.
For legacy stereo content, audio upmixing can provide a natural sounding 5.1 output. If other 5.1 and stereo elements are presented at the input of an upmix that device must be capable of passing the 5.1 transparently to the output without modification.
Transitions between stereo and 5.1 must be seamless to create a consistent 5.1 output. Just as a consistent 5.1 output is required in many cases, a consistent 2.0 output generated from both 2.0 and 5.1 content is also necessary in many legacy applications such as SD channels.
The DTS Neural Surround Multimerge accommodates 2.0 and 5.1 signals, and provides both 2.0 and 5.1 outputs so that legacy content and new 5.1 content can be processed into both 5.1 and 2.0 by upmixing or downmixing as necessary.
For quality control purposes, a stereo and surround sound monitoring position with both stereo and surround sound speakers should be created to verify the channel placement and audio levels. Typical VU and peak metering should be provided, as well as loudness monitoring and logging using the latest recommendation: ITU BS.1770.
Audio Bit Rate Reduction
Dolby technologies for the bit rate reduction audio can carry multiple channels of audio within a signal transport between facilities, into and out of tape transports and video/audio content servers, and into the home environment.
It should be noted that as Dolby technology is introduced into a system, another layer of audio information called audio metadata is available. Dolby audio metadata allows you to monitor and control essential signal parameters, including the critical three Ds: dialogue normalization, dynamic range, and downmixing.
All Dolby Digital Plus and Dolby Digital transmissions include metadata. However, there are several approaches for setting metadata and care needs to be taken in managing the metadata, because there can be audio quality issues if it is missing or not correct.
Metadata can be carried alongside audio using a number of methods—for example, as a separate data file or serial data stream, or packaged together with the audio as Dolby E.
In a typical scenario, as 5.1 mix is created it is encoded into Dolby E with the audio metadata. Dolby E can be encoded and decoded multiple times through the broadcast chain with no audible degradation and is used upstream of the final emission point.
All elements of a Dolby E bit stream are locked together so that the metadata and audio always maintain their timing relationship. The Dolby E signal doesn’t reach viewers at home; it is decoded back to baseband audio just prior to the final DTV transmission, and then re-encoded.
At the end of the signal chain before emission, the final audio content is decoded and re-encoded as Dolby Digital or Dolby Digital Plus along with the audio metadata.
Dolby Digital and Dolby Digital Plus are single pass decode/decode processes and are meant to be decoded once in the home environment. If Dolby Digital is decoded and re-encoded there will be audible artifacts over many passes.
Dolby Digital and Dolby Digital Plus are not designed to be used upstream of the final emission point and will cause quality issues and other challenges when using embedded audio in digital video frame synchronizers and video conversion processing. Dolby Digital is not locked to video when encoded.
If there is frame synchronizer in the path, Dolby Digital must be passed through the frame synchronizer as ancillary data, and may cause disturbances downstream when the frame sync drops or repeats.
If the video is being converted from one format to another, the Dolby Digital signal must be de-embedded, decoded, synchronized, re-encoded and re-embedded because of the change in ancillary data spaces and the change in clock domains.
Otherwise, if the Dolby Digital is de-embedded and re-embedded, it will “drift” relative to video over time. This additional processing adds to the complexity and cost. It is recommended that Dolby Digital encoded be at the end of the signal chain.
Article continues on next page ...
More and More Audio Channels
Users will encounter analog, digital and embedded audio interfaces in today’s audio systems. Analog audio is typically used on legacy equipment, and analog/digital conversion to interface will be required.
Modern equipment uses digital audio, which can be carried over an AES/EBU interface as either unbalanced over coaxial cable; or balanced over twisted pair cable. As well, digital audio can be carried embedded in the SDI (Serial Digital Interface) along with the video, data and metadata.
2.0 requires two analog channels and one AES channel. 5.1 requires six analog channels and three AES channels. Within the HD-SDI interface, up to 16 channels or eight AES signals can be carried.
When there is a single language, one must consider that the audio carried throughout the workflow could have as many as nine channels. There would be a 2.0 mix, a 5.1 mix and possibly a video description audio channel (for the vision-impaired).
It is possible in today’s systems to not have to use bit rate reduction until emission at the end of the audio signal chain, as most modern equipment will handle 16 channels of audio.
However, when there is more than one language, bit rate reduction techniques are required in the production and contribution workflow.
There are areas for live production, post production and a typical broadcast facility in the workflow diagram below.
The workflow can be updated for surround sound by adding the appropriate use of surround sound monitoring, audio metadata, up and downmixing and bit rate reduction, as well as processing tasks such as de-embedding, embedding and loudness control.
Moving to a surround sound workflow presents challenges, but solutions from Harris, Dolby and DTS exist today for an easy transition.
DTS is dedicated to making audio sound better. By addressing the most common problems broadcasters encounter related to loudness and surround sound workflows, DTS technologies make the creation of engaging audio, effortless.
DTS Neural Loudness Control manages audio loudness to conform in real-time to many loudness standards world-wide including BS.1770.
Neural Loudness Control has been optimized to correct content to a desired loudness level using the minimal amount of signal modification. This approach allows the perfect balance of adhering to loudness regulations, reducing viewer complaints and preservation of artistic intent.
DTS Neural MultiMerge is a unique process that always outputs a 5.1 and stereo signal no matter if the input audio signal is stereo or 5.1 surround sound. This must have tool simplifies broadcast audio workflows by insuring stereo audio is upmixed, 5.1 audio is downmixed and all outputs always have the best sounding audio.
Dolby offers technologies for all areas of the broadcast ecosystem. Broadcasters worldwide use Dolby solutions to create, distribute and transmit high-quality audio to consumers.
The Dolby E encoding system enables the production and broadcast of live events. It facilitates multichannel audio contribution from outside broadcast location to the broadcast center.
Dolby E enables distribution of up to eight channels of audio, plus metadata and timecode, via the existing stereo channels (AES/EBU) available on conventional digital videotapes, servers, communication links, switchers and routers.
Dolby Digital, also known as AC-3, is an audio encoding/decoding technology that delivers up to 5.1 discrete channels of surround sound.
- Provides compatibility with existing playback units that use Dolby technology
- Delivers mono to 5.1 audio Employs Dolby metadata that can be transcoded into Dolby Digital Plus Applied to the broadcast transmission signal just prior to multiplexing with video, Dolby Digital is used in digital satellite (DBS and DVB), cable, and terrestrial high-definition TV (ATSC).
Dolby Digital Plus is the ideal choice for TV and HDTV services offered by cable, satellite, broadcast television, and IPTV.
The Dolby Digital Plus codec improves encoding efficiency to permit future channel expansion beyond 5.1 channels to 7.1-channel surround sound or more. Its advanced compression capabilities enable secondary audio mixing, including advanced services for the visually impaired.