# Recursively Generating Gradient Images from Regular Expressions

I’ve decided to write yet another blog post on this blog post (first post, second post), because I’ve found another algorithm which is far superior to the other algorithms. I’ve almost completely rewritten the source code (which can be found at this project’s GitHub page). Using this approach I’ve been able to generate a 16.384px x 16.384px image (right click > save as, my browser crashes trying to render the image).

## The Problem

The greatest problem with the old algorithm was that we had to enumerate all strings, check if they’re part of the regular language and then compute their Hamming distances using @nya’s $O(n)$ algorithm, which works much better than the modified $O(n^2)$ algorithm for Levenshtein distance previously used. But since there are $4^d$ strings for some depth $d$ this whole procedure quickly becomes infeasible as especially testing for membership in the regular language is expensive. This makes rendering the images very slow.

Another fatal problem with the old algorithm was its memory usage. For instance, for $d=13$ (images of size 8192px x 8192px) we need to save 67 million strings and equally many pixels. To calculate how much memory that many strings will take up on the heap, we can refer to the article on Javamex (since I’ve written the program in Java, results may be different for C++, Pascal, etc.), which states that the minimum number bytes $B$ a single string of length $n$ takes up on the heap is:

$B = 8 \cdot \left \lceil \dfrac{n \cdot 2 + 45}{8} \right \rceil$

Which for d=13 gives 5.0 GB of heap usage. Similarily, a BufferedImage uses 4 bytes of information per pixel and has $4^{13}$ pixels in total, resulting in a heap usage of 256 MB. This is all very inefficient and is a huge, unnecessary drag on the system. It also sets a strict upper bound on the size of the images as all this memory allocation quickly becomes too much for the size of the RAM in the computer. For my computer (which has 8 GB of RAM) this strict upper bound is $d=14$, as it would require 21 GB of memory to store both the image and the strings.

Fortunately, there is a way around most of this, and that is what we’ll be discussing in this article. Essentially, what we do is enumerate the strings one by one and add them to the image immediately, rather than generate all of them. We also circumvent the problem of testing for membership using a clever construction.

## The Algorithm

To fully understand how this algorithm works, it’s essential to have some preliminary knowledge of finite automata and regular languages. A regular expression defines a regular language $L$, and so do finite automata. We can therefore construct a finite automaton from a regular expression. To do so, we employ the help of the dk.brics.automaton package written by Anders Moeller from Aarhus University. The reason we use finite automata is that they have a lot of nice features – there are efficient algorithms to minimize automata and determine language membership of strings.

What we want, however, is not the language that is defined by the regular expression – we want to restrict the strings to have a length $d$ given some depth. Therefore we can compute the intersection of the automaton that accepts strings of length $d$ and the automaton which defines $L$. Next up, we’ll enumerate all strings in $L \cap \Sigma^d$ and store them in an array $M$.

Other than having an automaton $A_1$ which recognizes strings of length $d$ in $L$, we also want an automaton $A_2$ that recognizes all strings of length $d$ which aren’t in $L$ – this is just $L^c \cap \Sigma^d$. Because now that we have $M$, we want to assign the remaining strings recognized by $A_2$ distances and save the associated pixel information in our raster.

To do this, we perform a depth-first search of $A_2$, starting at some initial state $S$. This can be written in Java-like pseudo-code as:

process(State s, String currentString, int depth){
//Stop at end nodes
if(depth == max && s.isAccepted()){
int dist = getHammingDistance(s);
paintPixel(currentString, dist);
return;
}

//Otherwise, get all transitions and process them as well
for(Transition t : s.getTransitions()){
process(t.getDestination(), currentString+t.getChar(), depth+1);
}
}

The most obvious speed-up with this approach is that s.isAccepted(); is a cheap operation, whereas testing for membership of a string in a regular language has a worst-case time complexity of $O(e^n)$. The other advantage is that there is no reason to save all the Strings in an array, making it possible to make much larger images. To view the algorithm in further detail I encourage you to go read the source code for this project (there is decent documentation, so the code shouldn’t be too difficult to read).

## Gallery

While there are plenty of these images in the previous posts, I thought it’d be fun to share a few more of them.

# Faster Gradient Images from Regular Expressions

After releasing my last post, I got some amazing feedback. The user @nya had tried replicating my algorithm in c++ and got some amazing results! Not only did he get just as beautiful images as I did, he was able to do it *much faster*. The code I wrote was able to generate a 512px x 512px image in just about one hour – and @nya was able to generate a 1024px x 1024px image in roughly five seconds.

While I wasn’t able to replicate his results, I got some fantastic feedback from @j2kun who made me realize that my approach was wasteful. Generating all strings to find the edit distance between two strings generates a lot of overlap and wasted CPU cycles. Instead, inspired by his post on word metrics, I decided to rewrite my algorithm.

## The New Algorithm

For some depth $d$ (string length), first generate all $4^d$ strings and save them in some set $\mathcal{S}$. Then find all the strings that match the regular expression and save them in some set $\mathcal{M}$. Then compute the set difference between $\mathcal{M}$ and $\mathcal{S}$ to find the set of all strings that don’t match the regular expression $\mathcal{D} = \mathcal{S} \backslash \mathcal{M}$.

Then, for every string in $\mathcal{D}$ compute the minimum edit distance to any string in $\mathcal{M}$ (iterate through the array) and store this value in a table. Obviously, the minimum edit distance any string in $\mathcal{D}$ can have is 1, so when we find a string with a distance of 1, we break and give the string a distance of 1. We can write this is Java-like pseudocode as:

for(String s : D){

int dist = depth

for(String t : M){
if(levenshtein(s,t) < d)
dist = levenshtein(s,t)

if(dist == 1)
break
}

table.put(s, dist)
}

To find the actual source code used to generate the images, check out this project’s GitHub page.

For some depth $d$, note that this algorithm has time-complexity $O(e^d)$ as we have $4^d$ strings. This is extremely slow for large $d$ – but fortunately, we don’t need very large $d$, as $d=10$ gives an image with a size of 1024px x 1024px.

Although the apparent time complexity is very slow, it has much, much better performance than the old algorithm. While the old algorithm took an hour to produce a 512px x 512px image, this new algorithm can generate a 1024px x 1024px in less than a minute (varies depending on the size of $\mathcal{M}$ and the complexity of the regex). It still doesn’t match the performance of @nya’s algorithm, but it is still an impressive speed-up.

Just to show that the performance of this algorithm dwarfs that of the old one, here are some 1024px x 1024px images that took anywhere from 10 seconds to 5 minutes to produce:

# Gradient Images from Regular Expressions

The source code for this project can be found at its GitHub-page: https://github.com/SSODelta/GradientRegexImages

## Gradient Images from Regular Expressions

In one of my posts from last year, I detailed an algorithm for creating pretty fractal images based on regular expressions. While this idea wasn’t novel, it was still very interesting, and it was a challenge for me to code it. However, the other day I got a great idea – I had figured out a way to generalize this concept to (I presumed) create even more pretty images.

The problem with the old method was that it was binary whether or not a given pixel belonged to the regular language. This makes excellent sense from a computer science theoretical perspective, but when making pretty images it’s nice to have some variation. So instead of the binary approach, I’m going to allow pixels to have a certain distance to being a member of the regular language. We define that d=0 means that a pixel belongs to the regular language, and d=1 means that it has an edit distance of 1 to belonging to the language. We only use substitution as a transformation in this example, however, because we want all strings to have the same length

As an example, let’s consider the regular language 3*, and consider of strings of length 2. Obviously, the only string of length 2 that belongs to this language (and thus has d=0) is “33”. The strings of distance d=1 are the strings that can result in “33” after only 1 substitution. An example of this is the string “32”, which after changing the 2 to a 3 suddenly belongs to the regular expression. Likewise, an example of a string with d=2 is “11”, where both characters need to be changed for it to belong to the language.

## Determining the Distance of Every String

Like in the old post, we need to generate all strings of length n, check if they belong to the language, and then paint the image based on this information. In the old post, however, it was pretty straight forward checking if the strings belonged to the language or not. But in this approach, we’re faced with a far more complicated problem – how do we determine the edit distance of any one string?

To do so, we first generate all strings of length and then we check which of these strings belong to the language (we first find all d=0 strings) and remember them in a HashMap. Then we generate all variations of these strings. Obviously, any string which is a variation of a d=0 string must be at most a d=1 – so, using this set of variations we mark all the strings not previously marked with d=0 with d=1. We continue doing so until we’ve exhausted the list of strings (which must happen eventually, as any two strings of length are at most edits apart).

At each step, we remove all strings for which we’ve already determined its edit distance as keeping it would only generate copies. Also, we stop after d=n-1 and mark the remaining strings as d=n (again, because the maximum edit distance is n).

### Coloring the Pixels

After finding the edit distance of every string, we have an integer $d \in [0; n]$. We map this unto the interval $[0; 1]$ by dividing each edit distance with n. Next up, we associate some color 0xRRGGBB with 0, and another color 0xRRGGBB with 1 and give each pixel a color in this gradient.

### Performance

This algorithm is very, very slow though. I’ve not been able to generate any larger than 512×512 images on my laptop (my patience lasts about 1 hour), and I fear a 1024×1024 is impossible due to the time complexity of the algorithm. I’m sure there are many ways to improve the algorithm’s performance (dynamic programming, probably), but I’ve yet to find them. But if you find an improvement (or a mistake), feel free to leave a comment.

## Gallery

Here is a list of all images I’ve yet to render on my machine. It’s very difficult to generate massive amounts of images due to the complexity of the algorithm (as opposed to my post on Newton’s Fractals, where I have a gallery of 100+ images)