Recursively Generating Gradient Images from Regular Expressions

I’ve decided to write yet another blog post on this blog post (first post, second post), because I’ve found another algorithm which is far superior to the other algorithms. I’ve almost completely rewritten the source code (which can be found at this project’s GitHub page). Using this approach I’ve been able to generate a 16.384px x 16.384px image (right click > save as, my browser crashes trying to render the image).

The Problem

The greatest problem with the old algorithm was that we had to enumerate all strings, check if they’re part of the regular language and then compute their Hamming distances using @nya’s $O(n)$ algorithm, which works much better than the modified $O(n^2)$ algorithm for Levenshtein distance previously used. But since there are $4^d$ strings for some depth $d$ this whole procedure quickly becomes infeasible as especially testing for membership in the regular language is expensive. This makes rendering the images very slow.

Another fatal problem with the old algorithm was its memory usage. For instance, for $d=13$ (images of size 8192px x 8192px) we need to save 67 million strings and equally many pixels. To calculate how much memory that many strings will take up on the heap, we can refer to the article on Javamex (since I’ve written the program in Java, results may be different for C++, Pascal, etc.), which states that the minimum number bytes $B$ a single string of length $n$ takes up on the heap is:

$B = 8 \cdot \left \lceil \dfrac{n \cdot 2 + 45}{8} \right \rceil$

Which for d=13 gives 5.0 GB of heap usage. Similarily, a BufferedImage uses 4 bytes of information per pixel and has $4^{13}$ pixels in total, resulting in a heap usage of 256 MB. This is all very inefficient and is a huge, unnecessary drag on the system. It also sets a strict upper bound on the size of the images as all this memory allocation quickly becomes too much for the size of the RAM in the computer. For my computer (which has 8 GB of RAM) this strict upper bound is $d=14$, as it would require 21 GB of memory to store both the image and the strings.

Fortunately, there is a way around most of this, and that is what we’ll be discussing in this article. Essentially, what we do is enumerate the strings one by one and add them to the image immediately, rather than generate all of them. We also circumvent the problem of testing for membership using a clever construction.

The Algorithm

To fully understand how this algorithm works, it’s essential to have some preliminary knowledge of finite automata and regular languages. A regular expression defines a regular language $L$, and so do finite automata. We can therefore construct a finite automaton from a regular expression. To do so, we employ the help of the dk.brics.automaton package written by Anders Moeller from Aarhus University. The reason we use finite automata is that they have a lot of nice features – there are efficient algorithms to minimize automata and determine language membership of strings.

What we want, however, is not the language that is defined by the regular expression – we want to restrict the strings to have a length $d$ given some depth. Therefore we can compute the intersection of the automaton that accepts strings of length $d$ and the automaton which defines $L$. Next up, we’ll enumerate all strings in $L \cap \Sigma^d$ and store them in an array $M$.

Other than having an automaton $A_1$ which recognizes strings of length $d$ in $L$, we also want an automaton $A_2$ that recognizes all strings of length $d$ which aren’t in $L$ – this is just $L^c \cap \Sigma^d$. Because now that we have $M$, we want to assign the remaining strings recognized by $A_2$ distances and save the associated pixel information in our raster.

To do this, we perform a depth-first search of $A_2$, starting at some initial state $S$. This can be written in Java-like pseudo-code as:

process(State s, String currentString, int depth){
//Stop at end nodes
if(depth == max && s.isAccepted()){
int dist = getHammingDistance(s);
paintPixel(currentString, dist);
return;
}

//Otherwise, get all transitions and process them as well
for(Transition t : s.getTransitions()){
process(t.getDestination(), currentString+t.getChar(), depth+1);
}
}

The most obvious speed-up with this approach is that s.isAccepted(); is a cheap operation, whereas testing for membership of a string in a regular language has a worst-case time complexity of $O(e^n)$. The other advantage is that there is no reason to save all the Strings in an array, making it possible to make much larger images. To view the algorithm in further detail I encourage you to go read the source code for this project (there is decent documentation, so the code shouldn’t be too difficult to read).

Gallery

While there are plenty of these images in the previous posts, I thought it’d be fun to share a few more of them.

Faster Gradient Images from Regular Expressions

After releasing my last post, I got some amazing feedback. The user @nya had tried replicating my algorithm in c++ and got some amazing results! Not only did he get just as beautiful images as I did, he was able to do it *much faster*. The code I wrote was able to generate a 512px x 512px image in just about one hour – and @nya was able to generate a 1024px x 1024px image in roughly five seconds.

While I wasn’t able to replicate his results, I got some fantastic feedback from @j2kun who made me realize that my approach was wasteful. Generating all strings to find the edit distance between two strings generates a lot of overlap and wasted CPU cycles. Instead, inspired by his post on word metrics, I decided to rewrite my algorithm.

The New Algorithm

For some depth $d$ (string length), first generate all $4^d$ strings and save them in some set $\mathcal{S}$. Then find all the strings that match the regular expression and save them in some set $\mathcal{M}$. Then compute the set difference between $\mathcal{M}$ and $\mathcal{S}$ to find the set of all strings that don’t match the regular expression $\mathcal{D} = \mathcal{S} \backslash \mathcal{M}$.

Then, for every string in $\mathcal{D}$ compute the minimum edit distance to any string in $\mathcal{M}$ (iterate through the array) and store this value in a table. Obviously, the minimum edit distance any string in $\mathcal{D}$ can have is 1, so when we find a string with a distance of 1, we break and give the string a distance of 1. We can write this is Java-like pseudocode as:

for(String s : D){

int dist = depth

for(String t : M){
if(levenshtein(s,t) < d)
dist = levenshtein(s,t)

if(dist == 1)
break
}

table.put(s, dist)
}

To find the actual source code used to generate the images, check out this project’s GitHub page.

For some depth $d$, note that this algorithm has time-complexity $O(e^d)$ as we have $4^d$ strings. This is extremely slow for large $d$ – but fortunately, we don’t need very large $d$, as $d=10$ gives an image with a size of 1024px x 1024px.

Although the apparent time complexity is very slow, it has much, much better performance than the old algorithm. While the old algorithm took an hour to produce a 512px x 512px image, this new algorithm can generate a 1024px x 1024px in less than a minute (varies depending on the size of $\mathcal{M}$ and the complexity of the regex). It still doesn’t match the performance of @nya’s algorithm, but it is still an impressive speed-up.

Just to show that the performance of this algorithm dwarfs that of the old one, here are some 1024px x 1024px images that took anywhere from 10 seconds to 5 minutes to produce:

The source code for this project can be found at its GitHub-page: https://github.com/SSODelta/GradientRegexImages

In one of my posts from last year, I detailed an algorithm for creating pretty fractal images based on regular expressions. While this idea wasn’t novel, it was still very interesting, and it was a challenge for me to code it. However, the other day I got a great idea – I had figured out a way to generalize this concept to (I presumed) create even more pretty images.

The problem with the old method was that it was binary whether or not a given pixel belonged to the regular language. This makes excellent sense from a computer science theoretical perspective, but when making pretty images it’s nice to have some variation. So instead of the binary approach, I’m going to allow pixels to have a certain distance to being a member of the regular language. We define that d=0 means that a pixel belongs to the regular language, and d=1 means that it has an edit distance of 1 to belonging to the language. We only use substitution as a transformation in this example, however, because we want all strings to have the same length

As an example, let’s consider the regular language 3*, and consider of strings of length 2. Obviously, the only string of length 2 that belongs to this language (and thus has d=0) is “33”. The strings of distance d=1 are the strings that can result in “33” after only 1 substitution. An example of this is the string “32”, which after changing the 2 to a 3 suddenly belongs to the regular expression. Likewise, an example of a string with d=2 is “11”, where both characters need to be changed for it to belong to the language.

Determining the Distance of Every String

Like in the old post, we need to generate all strings of length n, check if they belong to the language, and then paint the image based on this information. In the old post, however, it was pretty straight forward checking if the strings belonged to the language or not. But in this approach, we’re faced with a far more complicated problem – how do we determine the edit distance of any one string?

To do so, we first generate all strings of length and then we check which of these strings belong to the language (we first find all d=0 strings) and remember them in a HashMap. Then we generate all variations of these strings. Obviously, any string which is a variation of a d=0 string must be at most a d=1 – so, using this set of variations we mark all the strings not previously marked with d=0 with d=1. We continue doing so until we’ve exhausted the list of strings (which must happen eventually, as any two strings of length are at most edits apart).

At each step, we remove all strings for which we’ve already determined its edit distance as keeping it would only generate copies. Also, we stop after d=n-1 and mark the remaining strings as d=n (again, because the maximum edit distance is n).

Coloring the Pixels

After finding the edit distance of every string, we have an integer $d \in [0; n]$. We map this unto the interval $[0; 1]$ by dividing each edit distance with n. Next up, we associate some color 0xRRGGBB with 0, and another color 0xRRGGBB with 1 and give each pixel a color in this gradient.

Performance

This algorithm is very, very slow though. I’ve not been able to generate any larger than 512×512 images on my laptop (my patience lasts about 1 hour), and I fear a 1024×1024 is impossible due to the time complexity of the algorithm. I’m sure there are many ways to improve the algorithm’s performance (dynamic programming, probably), but I’ve yet to find them. But if you find an improvement (or a mistake), feel free to leave a comment.

Gallery

Here is a list of all images I’ve yet to render on my machine. It’s very difficult to generate massive amounts of images due to the complexity of the algorithm (as opposed to my post on Newton’s Fractals, where I have a gallery of 100+ images)

Creating Colored Images from Regular Expressions

Note: This post creates some cool colors, but to see something truly stunning, check out dln385’s images, inspired by these posts.

In my last post, I detailed an algorithm for generating images using regular expressions. However, I found them a tad boring and mundane, so in this post I’m going to add colors to the images.

We follow exactly the same algorithm for choosing where to color the pixels, but we need a method to decide what color to paint. To accomplish this, we introduce three colors channels C, M, Y which are basically a long string, composed of the same alphabet as the areas (0, 1, 2, 3).

For each string S (area) do the following:

• If S matches the regular expression, mark it as painted.
• Define a blank color CMY(0, 0, 0).
• For each color channel, xor the string accompanying the channel, and if it matches the regular expression, change the color channel from 0.0 to 1.0.

Let’s do an example. Let R be the regular expression (1+2+3)*0(1+3)*0(0+1+2+3)*.

Let the following be the color channels C,M,Y:

• C – 032102
• M – 033122
• Y – 132212

If we test the string S = 330100, we find it matches, so we must paint it.

Calculate S XOR C = 330100 XOR 032102 = 302002, which matches the regular expression, so the cyan color channel is 1.0.

Calculate S XOR M = 330100 XOR 033122 = 303022, which also matches the regular expression, so the magenta color channel is also 1.0.

Calculate S XOR Y = 330100 XOR 132212 = 202321, which does not match the regular expression, so the yellow color channel is 0.0.

This means we paint the area, marked by the string 330100 with the CMY-color (1, 1, 0).

Here is a visual example to showcase this new method:

You can also do AND instead of XOR, it produces images like the following:

It’s obviously less visually stunning, but still interesting. Here is OR instead:

It’s also less visually diverse, but still cool to look at.

Generating Images from Regular Expressions

There is an update to this post, which includes colors: https://ssodelta.wordpress.com/2014/03/25/creating-colored-images-from-regular-expressions/

Source code complete with Javadoc can be found at: http://pastebin.com/ccRK75aq

This post will detail a method for using regular expressions to generate images.

A regular expression is an array of characters, which represents a series of instructions, which for any input either matches or doesn’t match. For a simple example, the regular expression

ssodelta

Only matches the string ssodelta. This makes sense, but we can do pretty nice things with regular expressions. If we for example write

ssodelta|delta

Then the regular expression either matches ssodelta or the string delta. This is because the vertical bar | represents the symbol or. We also have so called quantifiers, the most important of which is the kleene star *.

sso(delta)*

The parentheses group strings together, like regular parentheses in mathematical expressions (2+4)*3, for example. The kleene star * means zero or more, meaning that there can be zero or more delta and it will still match the regular expression. A few matches include ssodelta, ssodeltadelta, ssodeltadeltadelta and so on…

I won’t completely detail regular expression, but if want to read up on it, I suggest close-reading the wikipedia article on the matter, but the bottom line is that regular expressions are really cool. But how can they be used to generate images?

Consider a completely square image, which is divided into 4 quadrants:

We can refer to each of these four quadrants by their numbers 0,1,2,3, and we can further divide these quadrants into four numbers, and label them accordingly.

But what does this have to do with regular expresssions? Well, we can have a regular expression, which only uses the characters 0,1,2,3 and every string adresses an area in this area. For example, the string 022 represents the area marked with red:

We can decide on a resolution n, and generate all strings of length n, parse them through a regular expression, and if it matches, paint it black. For example, the regular expression (0|2)(0+1+2+3)* generates the following image (for any resolution >= 2):

This is because the regular expression only matches if the cell begins with a 0 or a 2, which is why the 0th and the 2nd quadrants are painted black, and the rest aren’t.

We aren’t limited to boring checkers though, if we feed the generator the regular expression (1+2+3)*0(1+3)*0(0+1+2+3)*:

Notice the self-similarity in the image, that’s because this image is actually a fractal.

Here is a gallery of other images generated using this technique:

Notice how often triangles appear in these images.

Although the .java file has an included Javadoc, here’s how you use it to generate images:

Generator g = new Generator(RESOLUTION, IMAGE_SIZE);
g.generateAndExport("REGULAR_EXPRESSION", "EXPORTED_FILE_PATH.png"); //Must be a .png file

If you prefer Python code, check out Jack Morris’ post on the same subject, where he implements it in Python: http://jackm.co.uk/posts/2014/03/25/Using-Regular-Expressions-To-Generate-Images/