AiL: Audio in Layers Proposal

Sony DMXR 100 Digital Mixer taken at NAMM showroom

Image via Wikipedia

This is something that’s been bothering me off and on for a while, and, in the interests of regaining my ability to sleep at night without my mind focusing on this, I’ve decided to describe it in the form of a format guide – and maybe someone else will be interested in partnering up to make it a reality.

Existing audio formats, like MP3 and AAC, produce flat audio that makes it difficult to effect down-stream modifications to the track. Once you have authored your track, that’s it. It’s written in the stone of audio history unless you decide to go back and tweak and re-release it.

The same goes for multi-track output. You need to specifically create 8.1, 6.1, 4.1 and 2.1 versions of your track (which can end up being excessively time consuming and restrictive).

Audio in Layers is a proposal for giving audio another dimension: the ability to be dynamic and to adapt to whatever environment it is placed into. AiL takes advantage of the fact that most musical audio is now recorded using multitrack recording and down-mixed later.

AiL provides an alternative to down-mixing by examining each track and looking for repetition. Many tracks will have long periods of silence, and also periods of repetition that are completely or nearly identical to a previous point in the same track. AiL will remove periods of silence and store only one copy of repeated segments, applying lossless or near-lossless filters in cases where minor repetition differences are introduced.

AiL can also selectively down-mix these tracks, so if an event in Track A only happens in conjunction with an event in Track B, both of those events are down-mixed into a single audio segment.

All of the processed audio segments can then be saved in nearly any existing audio codec. OGG, FLAC, AAC+ and MP3 would all be worthy contenders for the underlying compression of these audio segments. Along with the segments, a definition (ideally XML) would be included to tell the decoder how to reassemble the pieces for a specific output.

As I mentioned earlier, effects could also be applied to these segments, and those effects can be defined directly in the decoder. Definitions could be included for a number of output media, from stereo to any format of surround sound imaginable. If down-mixing were not used, it would also become possible to allow the end-consumer to rearrange the composition of the track to their own requirements. Due to the layered composition, down-stream EQ adjustments and volume changes to specific tracks would be possible without damaging the quality of the track.

It would also be wise for encoders (and for any modification tools) to include a small clip of the track in a neutral format that a player could begin to play back instantly while waiting for the AiL decoder to reassemble the component tracks, allowing the reassembly to become transparent to the end-user.

(c) 2/14/2011 Robin Monks.  All Rights Reserved.

Enhanced by Zemanta

Leave a Reply