Embedding hidden messages in plain text

The source code for this project can be found at http://pastebin.com/X08bmjWh

This post will showcase a method for disguising secret messages in plain text using the unicode alphabet.

Some unicode characters are called white space characters, and are a collection of characters reserved for “space” between other characters, be it horizontal space or vertical space. However, there is one white space character, which is called ZERO_WIDTH_SPACE, and is literally a character without width – it is invisible. You can literally have hundreds of these characters embedded in a string of text, without anyone being the wiser. How can we use this to transmit secret informations to our allies?

A message is composed of bits, which can be either 0 or 1. This usually means on/off (at least for engineers), but in our case, it’ll denote the presence or absence of a ZERO_WIDTH_SPACE-character. We’ll let the vertical bar represent a bit in the following example, to illustrate how to program would work:

|H|E|Y| |Y|O|U| |T|H|E|R|E|

Each vertical bar represents a single bit – it can either be a ZERO_WIDTH_SPACE-character, or it can be nothing. This way we can transmit hidden messages without our enemies reading it (I’m looking at you, NSA). Given a string of n characters, this means we have n+1 bits available at our disposal. We’ll use the same encoding as our steganography post and use 5 bits per character*. This means we’d be able to encode a 28-character message into the length of a Twitter post (if we suppose Twitter doesn’t count unicode characters – which it does).

*We have moved all the characters up one bit (i.e. A is moved from 0 to 1, B is moved from 1 to 2 etc.) to give space to an exit character. This way the program knows when the message is done. Otherwise, it’d count a bunch of zeroes at the end of the post, giving us a lot of A’s at the end of every cipher.

Here’s a little practical example:

(1) NSA is a lawful and just organisation. They just want to help us.
(2) NSA is a lawful and just organisation. They just want to help us.

Can you tell which message has a cipher text encoded into it? No, me neither. (sidenote: WordPress doesn’t handle Unicode apparently, so the above the sentences are exactly the same)

Future projects: It’d be beneficial to encode consecutive bits of 1 into the same gap between two characters. This’d allow more data transmitted over fewer characters.

About these ads

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s