Why Does Bluetooth Audio Change Pitch Sometimes?

I have written this article to clear up some of the misconceptions that I have read in various blog and forum posts online.

First a basic summary: When listening to music on BT headphones or a BT speaker, the audio data is most often transcoded from whatever format (CODEC) it is stored in on the media player (usually MP3 or AAC format) to the SBC (Sub Band CODEC) defined as the guaranteed, lowest common denominator CODEC for A2DP audio. Some BT equipment (such as iPhone and Apple AirPods) are able to agree on a CODEC that is better than SBC and stream the audio in that format instead. While this may bring about an improvement in audio quaiity, it has nothing to do with the pitch bending phenomenon that I am about to talk about.

As a signal-processing engineer, my suspicion was that the pitch bending in the BT headphones was caused by a rate-adaptation mechanism in the player, needed to regulate the level of data in the receiver's buffer. I took a look at the Bluetooth A2DP (Advanced Audio Distribution Protocol) specification to see if the timing mechanisms were defined, but it really didn't say much about timing at all. It did, however, clarify that audio data is pushed to the receiving device at a rate determined by the sending device. Given that BT is a 2-way radio communication system, I would have hoped that the receiver could have pulled data from the sender at a rate that suits the receiver (like playing a sound file on a PC), or even used some form of rate-control signalling back to the sender. Alas this is not the case. So, given that the data is pushed, there is nothing the receiving device can do except buffer the data and play it back at a rate that needs to be determined. (Note that this is the same for all digital broadcast systems, such as DVB-T, DVB-S, DVB-C, DAB etc.)

In order to play back the audio at the same rate that it is being supplied by the sender, the receiving device will need to adaptively learn the playback rate. This can be achieved in many ways but will essentially be a Frequency Locked Loop. The goal of this tracking-loop will be to keep the level of data in the receiver's buffer nominally constant.

The difference in reference frequency between the BT sender and receiver is likely to be quite small. Even with cheap XTALs, this could be of the order of 100ppm (parts per million). So the amount of rate-adaptation that should be needed to accommodate this level of clock difference should be inaudible. However, this is not the only source of timing error between the sender and receiver; lost BT data packets can have a much more drastic affect.

I have assumed that when playback is started, the receiver will allow its data buffer to fill to some desired level; this could be 50% full, or given that data is more likely to be lost than to arrive too fast, will more likely be 80% or more. Once this data is buffered, the playback device will begin playing at a nominal sample rate. Unless data is lost or the device's clocks are wildly different, this situation should be good and require only the finest of rate adjustments to keep the buffer happy indefinitely.

Now consider that one or more frames of BT data go missing due to a poor BT signal, or from interference from WiFi etc. The receiver's data buffer will no longer be at its 'ideal' fill level. The player can do one of 2 things at this point: 1) keep going until the data buffer eventually hits empty, at which point the player will have to allow the buffer to re-fill and subject the listener to the inevitable time-discontinuity or 2) start playing catch-up (or 'catch-down' ?) by adjusting the effective playback sample rate. The latter is what results in the change of pitch which sounds awful when listening to music (though it is almost imperceptable when listening to speech).

So, we can see why the pitch shifts occur but I am still of the opinion that this is a poorly engineered solution to the problem. Personally, I suspect I would be happy with a plain and simple time discontinuity when the receiver decides it needs to realign its buffer. If any form of adaptive sample rate control is to be used then some reasonable limits on the maximum rate change should be set (e.g. 0.1%) so that it is not noticeable. (For reference, 1 semitone is a rate change of 5.95%)


If you think any part of this article is wrong please email me!


Bluetooth A2DP Spec v12


Last updated: 2.3.2021