*Edit: As /u/bart2019 pointed out, this is not a good algorithm for sorting playlists. It is meant as a solution to the problem of shuffling a list randomly, rather than creating an ideally shuffled playlist.*

When I was younger, I made an application that went online and crawled the internet for .mp3 files and organized them in playlists (the source code is long gone, but I might revive the project eventually).

However, there was a problem with the playlists – quite often actually, the same song would be played within a small amount of songs, which was pretty annoying. I never got around to fixing it, but in this post I’ll show you how I did it, why it was wrong, and how to fix it.

**My version (the wrong way)**

Suppose we have a list of songs in a playlist. In my naïve algorithm after a song had played, I chose a random song of the playlist and played that as the next song. This worked well of mixing the songs, but there was a huge issue with it. There is no restriction on what the *next *song can be. There is nothing stopping the player from playing the same song over and over again. An easy fix around this issue that I employed is to make a restriction on what the immediately next song can be, such as to avoid repetition. But this is not a good solution, because that model still allows the following list:

Which is still annoying to listen to. Suppose we have a list of songs, then the probability that the next song isn’t the same is . Let’s just define that it’s annoying for a song to be repeated within the next five entries in the queue, this means that the probability of *not* getting the same song within 5 entries is . If we listen to entries then the probability of hearing any song repeated within 5 entries is .

Suppose we have a playlist of 100 songs and listen to 20 songs, then the probability of hearing any song repeated is . Wow, we have a whopping 63% chance of hearing a song repeated using this way of shuffling the playlist (in fact, anything over 13 songs gives over 50% chance). This is not a good technique, so what can we do?

**The correct way**

To do this the right way, we need to employ some group theory. Instead of viewing our playlist as a list of songs , we view the playlist as a finite cyclic group of order . This means that the group has a generator , such that the group can be represented by repetiton of the operation on without repeating the same element twice. That is . Every cyclic group of order is isomorphic to the group under addition. This means we can easily find a generator , by making sure and are coprime (). Although all of this might seem fancy, this essentially accomplishes the same as keeping track of which songs have already played, and avoid playing those until you’ve played all songs in the playlist – this method is just less memory-intensive than keeping track.

A possible algorithm on a playlist of songs might be:

- Find random such that .
- Set .
- Play song .
- Set .
- If , go to step 3.
- Go to step 1.

**Pseudo-code example:**

int n = list.size(); while(playing){ int k = random(); while( gcd(n,k) > 1)k = random(); int i = k; do{ playSong(i); i = (i + k) % n; } while(i != k); }

I am not aware of how Spotify, iTunes and the like shuffles their playlists, but I have not personally experienced hearing the same song over and over again when listening to my playlist on Spotify (and I listen to a lot of music).

Hi Nikolaj,

I am not convinced about the probability of your “wrong way”. This is how i see it:

if we start with any song, the probability that the next song isn’t the same as the first one is

(n-1)/n,

as you already mentioned. However, the third song should not only be different from the second but also the first song. Therefore, the probability that the third song isn’t the same as the first two is

(n-2)/n

because there are only (n-2) songs to choose from which are different than the two songs already played. The probability that the fourth and fifth song aren’t the same as the ones already played is

(n-3)/n and (n-4)/n,

respectively. This means that the total probability of playing 5 unique songs is

(n-1) * (n-2) * (n-3) * (n-4) / n^4 = n * (n-1) * (n-2) * (n-3) * (n-4) / n^5

where i have added an extra (n/n) for convenience.

Now, this where it gets tricky because the sixth song played cannot be any of the last 5 songs already played so there are (n-5) songs to choose from. The same however applies to the seventh song: this songs is allowed to be the same as the first song meaning that there are also (n-5) songs left to choose from. The eighth song is allowed to be the same as the first two songs meaning there are again (n-5) songs to choose from, etc etc.

The probability then to play k songs without any song being repeated within 5 consecutive entries is:

P = n * (n-1) * (n-2) * (n-3) * (n-4) * (n-5)^(k-5) / n^k, where k > n

If we listen to k entries then the probability of hearing any song repeated within 5 entries is

1 – P

This means that if have a playlist of 100 songs and listen to 20 songs, then the probability of hearing any song repeated within 5 entries is 58%, which is less than what you found but still high. Anything over 16 songs gives you more that 50% probability.

Let me hear what you think,

Ernst

Hello Ernst,

Thanks you for your reply!

You are correct in you logic – I calculated the probability wrong, and I’ll correct it as soon as possible.

Regards,

Nikolaj

There is a discussion of shuffle psuedo-code here:

http://programmers.stackexchange.com/questions/194480/id-like-to-write-an-ultimate-shuffle-algorithm-to-sort-my-mp3-collection

I know this post is slightly old, but I was thinking about your method I think there’s a potentially big flaw in it (and I’m sorry if I understood it wrong): For an example, if you have 36 songs you want to shuffle, there’s 36! ways to create a list of them. However, there’s only 11 numbers that are relatively prime to 36, and since each one of those creates one specific randomized list, you’re only ever gonna get 11 lists, which is such a small number when compared to the total of 36! possible ways to permutate. And as the number of songs get bigger, the number of relative primes still remains fairly small and the number of possible ways to permutate them gets much, much larger.

If you listen to these same songs for a while, you may not get repeated songs, but you’re eventually gonna start seeing patterns in the ways songs follow one another, which is not a good thing for a randomizer.

raffa,

Sorry, I realize this a bit of a late reply so you may not see it, but for other people who read this, I’ll make the comment. The generator method listed above is not ideal for shuffling a playlist but for playing through an already shuffled playlist. So in the code you may have some ordering randomly assigned such that all 36! possible shuffles are possible and then using the generator after the shuffle to ensure no repeats occur.

Why not the naive way?

For N songs, it suffices to generate a random number M which is uniform mod N-1!. An T~(N*log(N)-N) bit random number will satisfy this; you should save this number across sessions, together with a counter K (initially 1) and a list {P1,…,PK} (initially {P1}). After each song, take R=T mod (N-K) to choose from S-{P1,…,PK} and make T=T/(N-K). When K=N, restart.

This guarantees a full cycle across sessions, assuming N doesn’t change. You can generalize it to require cycles of different lengths, and accept dynamic N.