How to shuffle a playlist correctly (and how not to do it)

by ssodelta


Edit: As /u/bart2019 pointed out, this is not a good algorithm for sorting playlists. It is meant as a solution to the problem of shuffling a list randomly, rather than creating an ideally shuffled playlist.

When I was younger, I made an application that went online and crawled the internet for .mp3 files and organized them in playlists (the source code is long gone, but I might revive the project eventually).

However, there was a problem with the playlists – quite often actually, the same song would be played within a small amount of songs, which was pretty annoying. I never got around to fixing it, but in this post I’ll show you how I did it, why it was wrong, and how to fix it.

My version (the wrong way)

Suppose we have a list S_1, S_2, S_3, ... S_n of songs in a playlist. In my naïve algorithm after a song had played,  I chose a random song of the playlist and played that as the next song. This worked well of mixing the songs, but there was a huge issue with it. There is no restriction on what the next song can be. There is nothing stopping the player from playing the same song over and over again. An easy fix around this issue that I employed is to make a restriction on what the immediately next song can be, such as to avoid repetition. But this is not a good solution, because that model still allows the following list:

S_{13}, S_{36}, S_{13}, S_{36}, S_{13} ... , S_{36}, S_{13}

Which is still annoying to listen to. Suppose we have a list of n songs, then the probability that the next song isn’t the same is \frac{n-1}{n}. Let’s just define that it’s annoying for a song to be repeated within the next five entries in the queue, this means that the probability of not getting the same song within 5 entries is (\frac{n-1}{n})^5. If we listen to k entries then the probability of hearing any song repeated within 5 entries is 1 - (\frac{n-1}{n})^{5k}.

Suppose we have a playlist of 100 songs and listen to 20 songs, then the probability of hearing any song repeated is 1 - (\frac{99}{100})^{5\cdot 20} = 0.63. Wow, we have a whopping 63% chance of hearing a song repeated using this way of shuffling the playlist (in fact, anything over 13 songs gives over 50% chance). This is not a good technique, so what can we do?

The correct way

To do this the right way, we need to employ some group theory. Instead of viewing our playlist as a list of songs S_1, S_2, S_3, ... S_n, we view the playlist as a finite cyclic group of order n. This means that the group has a generator g, such that the group G can be represented by repetiton of the operation on g without repeating the same element twice. That is G = {g^0, g^1, g^2... g^n}. Every cyclic group of order n is isomorphic to the group Z/nZ under addition. This means we can easily find a generator k, by making sure k and n are coprime (gcd(k,n) = 1). Although all of this might seem fancy, this essentially accomplishes the same as keeping track of which songs have already played, and avoid playing those until you’ve played all songs in the playlist – this method is just less memory-intensive than keeping track.

A possible algorithm on a playlist of n songs might be:

  1. Find random k such that gcd(k,n) = 1.
  2. Set i = k.
  3. Play song S_i.
  4. Set i = i + k~ (\textrm{mod}~ n).
  5. If i \neq k, go to step 3.
  6. Go to step 1.

Pseudo-code example:

int n = list.size();

while(playing){

   int k = random();
   while( gcd(n,k) > 1)k = random();
   int i = k;
   do{
      playSong(i);
      i = (i + k) % n;
   } while(i != k);

}

I am not aware of how Spotify, iTunes and the like shuffles their playlists, but I have not personally experienced hearing the same song over and over again when listening to my playlist on Spotify (and I listen to a lot of music).

About these ads