Implementation of the mixing process
Tempo Extraction
The tempo of each song is found by applying 3 algorithms.
The 'initial guess' method quickly finds a rough estimate of the tempo, by counting the number of beats in the song, and using the most common difference between the beats to generate a tempo guess.
The 'convolution' method then optimises this guess. This is a signal processing technique which effectively emphasises peaks in amplitude (beats). This effect is shown below - the original song time series (left) is convolved to give a new time series (right).
![]()
Finally, the tempo value is further optimised using a 'comb-filtering' method. This applies a set of filters over samples of the song, each filter with an associated tempo (over a small tempo range). The method chooses the filter with an associated tempo closest to that of the song - finding a precise tempo estimate.
The output tempo of the comb-filtering method is therefore used to mix the songs together.
Timescaling
Once the speed of both songs has been determined, the songs are then scaled in time by a simple resampling method to make them both the same speed.
Alignment
The following diagrams show the structure of a typical dance song:
Most dance music is made up of phrases of 8 bars, where the music continually leads towards an emphasised downbeat at the beginning of each phrase. Songs must be aligned such that phrases start together.
A 'breakdown' is a passage in a song where the beat is absent for a period, before once more being introduced. The program assumes a breakdown is present in each song, and tries to align two songs such that the mixing passage occurs after the final breakdown of the first song. This is shown below:
Fading
Now that songs are the same speed and are aligned appropriately, all that remains is to fade from one to the other. In this project this is done over a period of one phrase, or 8 bars. The first song is simply turned down as the second is turned up. This is shown in the diagram above by the purple line, which represents the volume of the song.