I’ve decided to write yet another blog post on this blog post (first post, second post), because I’ve found another algorithm which is far superior to the other algorithms. I’ve almost completely rewritten the source code (which can be found at this project’s GitHub page). Using this approach I’ve been able to generate a 16.384px x 16.384px image (right click > save as, my browser crashes trying to render the image).

## The Problem

The greatest problem with the old algorithm was that we had to enumerate all strings, check if they’re part of the regular language and then compute their Hamming distances using @nya’s algorithm, which works much better than the modified algorithm for Levenshtein distance previously used. But since there are strings for some depth this whole procedure quickly becomes infeasible as especially testing for membership in the regular language is expensive. This makes rendering the images very slow.

Another fatal problem with the old algorithm was its memory usage. For instance, for (images of size 8192px x 8192px) we need to save 67 million strings and equally many pixels. To calculate how much memory that many strings will take up on the heap, we can refer to the article on Javamex (since I’ve written the program in Java, results may be different for C++, Pascal, etc.), which states that the minimum number bytes a single string of length takes up on the heap is:

Which for d=13 gives 5.0 GB of heap usage. Similarily, a BufferedImage uses 4 bytes of information per pixel and has pixels in total, resulting in a heap usage of 256 MB. This is all very inefficient and is a huge, unnecessary drag on the system. It also sets a strict upper bound on the size of the images as all this memory allocation quickly becomes too much for the size of the RAM in the computer. For my computer (which has 8 GB of RAM) this strict upper bound is , as it would require 21 GB of memory to store both the image and the strings.

Fortunately, there is a way around most of this, and that is what we’ll be discussing in this article. Essentially, what we do is enumerate the strings one by one and add them to the image immediately, rather than generate all of them. We also circumvent the problem of testing for membership using a clever construction.

## The Algorithm

To fully understand how this algorithm works, it’s essential to have some preliminary knowledge of finite automata and regular languages. A regular expression defines a regular language , and so do finite automata. We can therefore construct a finite automaton from a regular expression. To do so, we employ the help of the dk.brics.automaton package written by Anders Moeller from Aarhus University. The reason we use finite automata is that they have a lot of nice features – there are efficient algorithms to minimize automata and determine language membership of strings.

What we want, however, is not the language that is defined by the regular expression – we want to restrict the strings to have a length given some depth. Therefore we can compute the intersection of the automaton that accepts strings of length and the automaton which defines . Next up, we’ll enumerate all strings in and store them in an array .

Other than having an automaton which recognizes strings of length in , we also want an automaton that recognizes all strings of length which *aren’t *in – this is just . Because now that we have , we want to assign the remaining strings recognized by distances and save the associated pixel information in our raster.

To do this, we perform a depth-first search of , starting at some initial state . This can be written in Java-like pseudo-code as:

process(State s, String currentString, int depth){ //Stop at end nodes if(depth == max && s.isAccepted()){ int dist = getHammingDistance(s); paintPixel(currentString, dist); return; } //Otherwise, get all transitions and process them as well for(Transition t : s.getTransitions()){ process(t.getDestination(), currentString+t.getChar(), depth+1); } }

The most obvious speed-up with this approach is that s.isAccepted(); is a cheap operation, whereas testing for membership of a string in a regular language has a worst-case time complexity of . The other advantage is that there is no reason to save all the Strings in an array, making it possible to make much larger images. To view the algorithm in further detail I encourage you to go read the source code for this project (there is decent documentation, so the code shouldn’t be too difficult to read).

## Gallery

While there are plenty of these images in the previous posts, I thought it’d be fun to share a few more of them.