The source code for this project can be found at http://pastebin.com/X08bmjWh
This post will showcase a method for disguising secret messages in plain text using the unicode alphabet.
Some unicode characters are called white space characters, and are a collection of characters reserved for “space” between other characters, be it horizontal space or vertical space. However, there is one white space character, which is called ZERO_WIDTH_SPACE, and is literally a character without width – it is invisible. You can literally have hundreds of these characters embedded in a string of text, without anyone being the wiser. How can we use this to transmit secret informations to our allies?
A message is composed of bits, which can be either 0 or 1. This usually means on/off (at least for engineers), but in our case, it’ll denote the presence or absence of a ZERO_WIDTH_SPACE-character. We’ll let the vertical bar | represent a bit in the following example, to illustrate how to program would work:
|H|E|Y| |Y|O|U| |T|H|E|R|E|
Each vertical bar represents a single bit – it can either be a ZERO_WIDTH_SPACE-character, or it can be nothing. This way we can transmit hidden messages without our enemies reading it (I’m looking at you, NSA). Given a string of n characters, this means we have n+1 bits available at our disposal. We’ll use the same encoding as our steganography post and use 5 bits per character*. This means we’d be able to encode a 28-character message into the length of a Twitter post (if we suppose Twitter doesn’t count unicode characters – which it does).
*We have moved all the characters up one bit (i.e. A is moved from 0 to 1, B is moved from 1 to 2 etc.) to give space to an exit character. This way the program knows when the message is done. Otherwise, it’d count a bunch of zeroes at the end of the post, giving us a lot of A’s at the end of every cipher.
Here’s a little practical example:
(1) NSA is a lawful and just organisation. They just want to help us. (2) NSA is a lawful and just organisation. They just want to help us.
Can you tell which message has a cipher text encoded into it? No, me neither. (sidenote: WordPress doesn’t handle Unicode apparently, so the above the sentences are exactly the same)
Future projects: It’d be beneficial to encode consecutive bits of 1 into the same gap between two characters. This’d allow more data transmitted over fewer characters.