Faster Gradient Images from Regular Expressions

After releasing my last post, I got some amazing feedback. The user @nya had tried replicating my algorithm in c++ and got some amazing results! Not only did he get just as beautiful images as I did, he was able to do it *much faster*. The code I wrote was able to generate a 512px x 512px image in just about one hour – and @nya was able to generate a 1024px x 1024px image in roughly five seconds.

While I wasn’t able to replicate his results, I got some fantastic feedback from @j2kun who made me realize that my approach was wasteful. Generating all strings to find the edit distance between two strings generates a lot of overlap and wasted CPU cycles. Instead, inspired by his post on word metrics, I decided to rewrite my algorithm.

The New Algorithm

For some depth $d$ (string length), first generate all $4^d$ strings and save them in some set $\mathcal{S}$. Then find all the strings that match the regular expression and save them in some set $\mathcal{M}$. Then compute the set difference between $\mathcal{M}$ and $\mathcal{S}$ to find the set of all strings that don’t match the regular expression $\mathcal{D} = \mathcal{S} \backslash \mathcal{M}$.

Then, for every string in $\mathcal{D}$ compute the minimum edit distance to any string in $\mathcal{M}$ (iterate through the array) and store this value in a table. Obviously, the minimum edit distance any string in $\mathcal{D}$ can have is 1, so when we find a string with a distance of 1, we break and give the string a distance of 1. We can write this is Java-like pseudocode as:

for(String s : D){

int dist = depth

for(String t : M){
if(levenshtein(s,t) < d)
dist = levenshtein(s,t)

if(dist == 1)
break
}

table.put(s, dist)
}

To find the actual source code used to generate the images, check out this project’s GitHub page.

For some depth $d$, note that this algorithm has time-complexity $O(e^d)$ as we have $4^d$ strings. This is extremely slow for large $d$ – but fortunately, we don’t need very large $d$, as $d=10$ gives an image with a size of 1024px x 1024px.

Although the apparent time complexity is very slow, it has much, much better performance than the old algorithm. While the old algorithm took an hour to produce a 512px x 512px image, this new algorithm can generate a 1024px x 1024px in less than a minute (varies depending on the size of $\mathcal{M}$ and the complexity of the regex). It still doesn’t match the performance of @nya’s algorithm, but it is still an impressive speed-up.

Just to show that the performance of this algorithm dwarfs that of the old one, here are some 1024px x 1024px images that took anywhere from 10 seconds to 5 minutes to produce:

4 thoughts on “Faster Gradient Images from Regular Expressions”

1. nya says:

great!
the next step would be to rewrite a non-allocating distance function.
since all the strings have the same lenght it would be

(pseudocode i don’t know java) distance(f: str, s: str) -> dist {
dist = 0
for(i=0, i<str.size(); ++i) {if f[i] != s[i] ++dist }
return dist
}

meanwhile i managed to rewrite it to make the code easier to read and faster.
it can generate up to 8k*8k size images (due to ram limitations of my pc) but i seem incapable to upload them on imgur, oh well..
"low"-res versions https://imgur.com/a/H9y6b

1. Thank you for that algorithm! Can’t believe I didn’t think of that – fantastic having a O(n) algorithm for edit distance. By the way, what are the regexps for the final two images in your album?

2. nya says:

No problem. keep in mind that is a hamming distance rether than levenshtein (not that you couldn’t write a non allocating levenshtein since the performance hit is mostly due to that).
Also, i didn’t prove it, but i think if there is a string s such that min levenshtein on L is x then there’s also an element in L such that min hamming is x.

depth: 11 rx: “11111.+1.*1”
depth: 11 rx: “1111.+1.*1”

1. nya says:

cancel what i just said about lev == ham. that’s true only for some L