Embedding hidden messages in plain text

The source code for this project can be found at http://pastebin.com/X08bmjWh

This post will showcase a method for disguising secret messages in plain text using the unicode alphabet.

Some unicode characters are called white space characters, and are a collection of characters reserved for “space” between other characters, be it horizontal space or vertical space. However, there is one white space character, which is called ZERO_WIDTH_SPACE, and is literally a character without width – it is invisible. You can literally have hundreds of these characters embedded in a string of text, without anyone being the wiser. How can we use this to transmit secret informations to our allies?

A message is composed of bits, which can be either 0 or 1. This usually means on/off (at least for engineers), but in our case, it’ll denote the presence or absence of a ZERO_WIDTH_SPACE-character. We’ll let the vertical bar represent a bit in the following example, to illustrate how to program would work:

|H|E|Y| |Y|O|U| |T|H|E|R|E|

Each vertical bar represents a single bit – it can either be a ZERO_WIDTH_SPACE-character, or it can be nothing. This way we can transmit hidden messages without our enemies reading it (I’m looking at you, NSA). Given a string of n characters, this means we have n+1 bits available at our disposal. We’ll use the same encoding as our steganography post and use 5 bits per character*. This means we’d be able to encode a 28-character message into the length of a Twitter post (if we suppose Twitter doesn’t count unicode characters – which it does).

*We have moved all the characters up one bit (i.e. A is moved from 0 to 1, B is moved from 1 to 2 etc.) to give space to an exit character. This way the program knows when the message is done. Otherwise, it’d count a bunch of zeroes at the end of the post, giving us a lot of A’s at the end of every cipher.

Here’s a little practical example:

(1) NSA is a lawful and just organisation. They just want to help us.
(2) NSA is a lawful and just organisation. They just want to help us.

Can you tell which message has a cipher text encoded into it? No, me neither. (sidenote: WordPress doesn’t handle Unicode apparently, so the above the sentences are exactly the same)

Future projects: It’d be beneficial to encode consecutive bits of 1 into the same gap between two characters. This’d allow more data transmitted over fewer characters.


Comment on this article

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s