After releasing my last post, I got some amazing feedback. The user @nya had tried replicating my algorithm in c++ and got some amazing results! Not only did he get just as beautiful images as I did, he was able to do it *much faster*. The code I wrote was able to generate a 512px x 512px image in just about one hour – and @nya was able to generate a 1024px x 1024px image in roughly five seconds.

While I wasn’t able to replicate his results, I got some fantastic feedback from @j2kun who made me realize that my approach was wasteful. Generating *all *strings to find the edit distance between two strings generates a lot of overlap and wasted CPU cycles. Instead, inspired by his post on word metrics, I decided to rewrite my algorithm.

## The New Algorithm

For some depth * *(string length), first generate all strings and save them in some set . Then find all the strings that match the regular expression and save them in some set . Then compute the set difference between and to find the set of all strings that *don’t *match the regular expression .

Then, for every string in compute the minimum edit distance to any string in (iterate through the array) and store this value in a table. Obviously, the minimum edit distance any string in can have is 1, so when we find a string with a distance of 1, we break and give the string a distance of 1. We can write this is Java-like pseudocode as:

for(String s : D){ int dist = depth for(String t : M){ if(levenshtein(s,t) < d) dist = levenshtein(s,t) if(dist == 1) break } table.put(s, dist) }

To find the actual source code used to generate the images, check out this project’s GitHub page.

For some depth , note that this algorithm has time-complexity as we have strings. This is extremely slow for large – but fortunately, we don’t need very large , as gives an image with a size of 1024px x 1024px.

Although the apparent time complexity is very slow, it has much, *much *better performance than the old algorithm. While the old algorithm took an hour to produce a 512px x 512px image, this new algorithm can generate a 1024px x 1024px in less than a minute (varies depending on the size of and the complexity of the regex). It still doesn’t match the performance of @nya’s algorithm, but it is still an impressive speed-up.

Just to show that the performance of this algorithm dwarfs that of the old one, here are some 1024px x 1024px images that took anywhere from 10 seconds to 5 minutes to produce:

great!

the next step would be to rewrite a non-allocating distance function.

since all the strings have the same lenght it would be

(pseudocode i don’t know java) distance(f: str, s: str) -> dist {

dist = 0

for(i=0, i<str.size(); ++i) {if f[i] != s[i] ++dist }

return dist

}

meanwhile i managed to rewrite it to make the code easier to read and faster.

it can generate up to 8k*8k size images (due to ram limitations of my pc) but i seem incapable to upload them on imgur, oh well..

"low"-res versions https://imgur.com/a/H9y6b

Thank you for that algorithm! Can’t believe I didn’t think of that – fantastic having a O(n) algorithm for edit distance. By the way, what are the regexps for the final two images in your album?

No problem. keep in mind that is a hamming distance rether than levenshtein (not that you couldn’t write a non allocating levenshtein since the performance hit is mostly due to that).

Also, i didn’t prove it, but i think if there is a string s such that min levenshtein on L is x then there’s also an element in L such that min hamming is x.

depth: 11 rx: “11111.+1.*1”

depth: 11 rx: “1111.+1.*1”

cancel what i just said about lev == ham. that’s true only for some L