Replay Gain - A Proposed Standard

Peak Amplitude Data Format

Why store this?

Scanning the file for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored within the file header. This can be used to check if the required replay gain adjustment will cause the file to clip.

Data Format

The maximum peak amplitude (a single value) should be stored as a 32-bit floating point number, where 1=digital full scale.

Uncompressed Files

Simply store the maximum absolute sample value held in the file (on any channel). The single sample value should be converted to a 32-bit float, such that digital full scale is equivalent to a value of 1.

Compressed files

Compressed audio does not exist as a waveform until it is decoded. Unfortunately, psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. However, it is likely that such values will be brought back within range after scaling by the replay level. Even so, it is necessary to store the peak value of a compressed file as a 32-bit floating-point representation, where +/-1 represent digital full scale, and values outside this range would usually clip.

Implementation

For uncompressed files, the maximum values must be found and stored. For compressed files, the files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom), and the maximum value stored.

Suggestions and further work

Some clarification of byte order, and conversion to floating point may be necessary. As always, suggestions are welcome!