Image Secrecy and XOR


Note: If you don’t what an XOR gate is, you might consider reading up on it on wikipedia. It’s not necessary to read the article, but it’s recommended.

The source code for this project can be found at: http://pastebin.com/SLKNLacD

In my previous posts on steganography, we’ve dealt with concealing information inside image, but what if we want to be able to send an image without worrying that people might be able to decipher it?

For this, we can use what is known as a one time pad. We can generate random noise and XOR an image with it and no one will be the wiser. However, this is not very fun. We can employ the same technique to generate a mask to convert an image into another image. But how do we XOR two images?

As you might recall from our posts on encoding bits within images, each pixel consists of 24 bits (8 for red, 8 for green, 8 for blue). So to XOR two images, we simply XOR each pixel in the first image with its corresponding pixel in the second image.

Given an image A and another image B, we’re looking for another image x such that A \oplus x = B. How do we calculate this x?

We notice that A \oplus x = B \leftrightarrow (A \oplus x) \oplus A = B \oplus A.

Because the XOR-operation is associative: (A \oplus x) \oplus A = B\oplus A \leftrightarrow (A \oplus A) \oplus x = B\oplus A.

Anything XOR itself is 0, so: (A \oplus A) \oplus x = B\oplus A \leftrightarrow 0 \oplus x = B\oplus A.

And anything XOR 0 is itself: 0 \oplus x = B\oplus A \leftrightarrow x = B \oplus A.

This means we’ve found a formula for the image x! We’ll now present an example of XOR’ing two image together:

xor_example

It’s quite clear that the last image looks like a weird, trippy morph of the two original images (at least when we know the images we XOR together). And it’s also clear that the regions with a lot of contrast (i.e. the transition from the cats tongue to its fur and the grass) look more random than the regions with low contrast (the region where there’s only grass). We will now show the transition from cat to forest:

xor_example_2

Hey, this is pretty neat! We can use this mask to transform the cat into the forest. What about the other way?

xor_example_3

Yes, we can. This is really cool. We now have an image that we can use to convert the image of the forest to the image of the cat. But is this still steganography? If the image C = A \oplus B was child pornography (or some other illegal image), could you get arrested for possession of A and B? You probably would, as it’s pretty easy to see if you’ve gotten the right image, but presumably, you could XOR any image into any image, as we’ve already shown. Let’s assume that the image of the cat is an illegal one. Under possession of the XOR’ed cat and forest, and the forest, one could claim you were already in possession of the cat. But what stops you from stating that this is what you intended?

xor_example_4

The image to the right would be the one you had originally intended to hide, and the image of the river would then be the key.

Worth a thought.

Embedding hidden messages in plain text MK-II


Note: This is a follow-up post to the other post I made on embedding hidden messages in plain text.

Source code: http://pastebin.com/B1bc9CCt

After I made the original post on embedding hidden messages in plain text, one of my friends (called Avegaard) pointed out to me that there’s a far more efficient way of storing the bits within plain text. If you’ll recall the last way of storing bits, we used the following procedure:

|H|E|L|L|O| |T|H|E|R|E|

Where we either stored or didn’t store a ZERO_SPACE_WIDTH character in between the letters to denote the bits. Given a string of length n, this gave us n+1 bits to store data, however, this is far from the most efficient way of doing things. If we let the letters of the string denote a 0 and the ZERO_SPACE_WIDTH character denote a 1, we have many more bits at our disposal. This means that adjacent 1’s will be crammed together, giving more space to play with. If we let | denote a ZERO_SPACE_WIDTH character, this means that encoding the bit-string 01010111 in the above ‘hello there’-message would look like

H|E|L|||LO THERE

Notice how we have multiple 1’s in a row. But because the ZERO_SPACE_WIDTH character has literally no width, we can cram together as many as we want without anyone being the wiser. In reality, however, the above message would look like this:

HELLO THE|R|E|||

This is because it doesn’t matter how many 0’s we prepend to a bitstring – it’ll always be the same bitstring. But appending 0’s completely changes the data, then we’d need a special character reserved for STOP telling the program to stop counting bits anymore – otherwise it’d append a bunch of 0’s. But by placing the bits at the end, we don’t need a special STOPcharacter, because the program can prepend as many 0’s as it wants. 00000001 is the same as 01, that is.

Now, let’s analyse how well our new algorithm is – storage wise. The worst case scenario is embedding a text string consisting solely of 0’s, as it doesn’t add any characters. For a text string of size n, this means the worst case storage capacity is n bits, which actually is worse than our old algorithm – but the only bitstrings consisting only of 0’s represents a long list of AA, so unless we’re storing only A’s, this new algorithm will always either be able to store as much as the old algorithm or more.

Note: Regardless of the method used to store a hidden message within text, it’s not an efficient compression algorithm, as we’re using two bytes to represent each ZERO_SPACE_WIDTH character, and one (or more) byte to represent each original character. So when I say space efficient I mean efficient in the sense of how many bits can be stored and subsequently read from a string of regular text.

It should be obvious by now that the number of bits this algorithm can store within a string of n characters is n + the number of 1’s, as each one turns into a ZERO_SPACE_WIDTH character and is invisible. Thus, the more 1’s we have in our secret message, the more space we’ll be able to store. Let’s make the naïve assumption that our secret message is completely random. This means there are an equal number of 0’s and 1’s, and we’ll therefore be able to store an additional bit of data for each regular character, giving us a storage capacity of 2n – given that an image file is 200 kB, we’d need 100 kB of text to hide the image, which is 100.000 ASCII characters = 350 wikipedia pages.

Although this seems completely horrendous, it’s definitely better than the old algorithm (it needed nearly 700 pages), but there’s just one problem – we’re not storing random data, we’re storing text, and text is far from random. For example, the letter e is much more likely to show up than the letter q for instance. Here’s a list of the few most frequent characters in the English language (source: http://millikeys.sourceforge.net/freqanalysis.html):

Character Frequency
SPACE BAR 18.74 %
E 9.60 %
T 7.02 %
A 6.21 %
O 5.84 %
I 5.22 %
N 5.21 %
H 4.87 %
S 4.77 %
R 4.43 %

How can we use this information to our advantage? Recall that 1’s are basically free, as they don’t take any space away from the characters – the algorithm is limited by the number of 0’s in a string of text, not the number of 1’s. This means we can assign the bitstring that has the most 1’s to the character that shows up most often, which is the space bar. Here’s a new encoding, that assigns the most frequent characters the bitstrings that contain the most 1’s (in descending order):

Character Binary Code Decimal Code
SPACE BAR 11111 31
E 01111 15
T 10111 23
A 11011 27
O 11101 29
I 11110 30
N 11100 28
H 11010 26
S 10110 22
R 01110 14
D 01101 13
L 01011 1
U 00111 7
M 10011 19
C 10101 21
W 11001 25
G 00011 3
F 00101 5
Y 01001 9
P 10001 17
COMMA 10010 18
DOT 10100 20
B 11000 24
K 01100 12
V 01010 10
APOSTROPHE 00110 6
? 10000 16
X 01000 8
J 00100 4
! 00010 2
Q 00001 1
Z 00000 0

So how many bits can we expect to store within a string of text n characters long? In probability theory, the expected value is sum of all possible values times the probability of each value. That is E = \Sigma p_i \cdot c_i, where c denotes an event, and p is the probability of that event happening. This means we can find the expected number of additional bits of information, where each event is the number of 1’s in a character, and the probability is the probability of that character appearing.I constructed an excel file for doing just this, taking in account that the probabilities in the above link doesn’t add up to 100 (it adds up to 98.64 %, which means I divide all the probabilities by 0.9864 to account for this). I find that the expected number of additional bits is…

Screen Shot 2013-12-05 at 10.25.02

… 3.6. This means I gain approximately 3.6 bits for every character in the text string, which means the expected number of bits for a hidden message in a text string of length n is 4.6n. This means we’re able to encode 0.575 character into a single character of text. Using the data found at this website, this means we can encode around 5 news articles into a wikipedia article.

Here’s a brief explanation of how you use the program:

//Encode a hidden message 
String encoded = Encoder.encodeString(plain_text, hidden_message);

//Retrieve the hidden message
String hidden_message = Encoder.getMessage(encoded);

Source code: http://pastebin.com/B1bc9CCt

Secret message: http://pastebin.com/S35w7W8D

Steganography and PNG’s MK-III


This is yet another follow-up post to my posts about encoding hidden messages in PNG’s: Article I, Article II.

Source code: http://pastebin.com/xWzMcayL

So far, we’ve only dealt with concealing text within PNG’s, but what about other things but text? In this post we’ll account for a method for storing arbitrary data within the least significant bits of a .PNG file.

Arbitrary data is, just like the text encoding we’ve used in the past two posts, composed of bits, so there’s no obvious reason why we shouldn’t be able to store other things than text within the least significant bits of the pixel of a .PNG file. However, data stored on the computer is composed of bytes (groups of 8 bits), and because we can store 5 bits per pixel, we can’t encode a byte into a single pixel. Instead, we choose to encode 5 bytes (40 bits) into 8 consecutive pixels (40 bits), which fits neatly. So, to store a file within a .PNG file, we’ll read it as bytes, and split these into groups of 5 bytes. Then we’ll “jump” (as explained in Article II) accordingly, and store these 5 bytes within the next 8 pixels. We’ll add 5 blank bytes (0x00) at the end of our data as an “exit code” – otherwise the program will read random bytes indefinitely.

We’ve used this to encode an image onto one of theses images of Barack Obama:

barack_osama barackobama

Hint: The password is ‘realobama’

The program is not limited, however, by images. You can encode literally any byte-string unto the least significant bits of an image. This includes .txt-files, .zip-files and even executables (!). So if you want to send a secret file within an image, this is the program for you!

The source code is found at the top of this post. Note: You need the Group-class found in Article II for the program to work properly.

Future plans: This program is easily susceptible to deciphering – even without knowing the passphrase. I might construct a decoder eventually.

Here’s how to use it with the new features:

//Encoding
Encoder e = new Encoder("path-of-image-file.png");
e.encodeFile("path-of-file-to-encode", "password");
e.saveFile("new-embedded-image.png"); //Again, we encourage .PNG

//Decoding
Encoder e = new Encoder("path-of-image-file.png");
e.saveContentFile("path-of-extracted-file", "password");

Steganography and PNG’s MK-II


Note, this is a follow-up post to the post on embedding secret information inside of  PNG-files: Steganography and PNG’s

Source code: http://pastebin.com/nTS9Dp0M

In my last post, I introduced the concept of concealing information within the least significant bits of an image file. However, there was a major flaw in the design – it was far too easy to read the hidden information, as it embedded it into the bits systematically.

In this post we’ll introduce keys and develop a method for embedding information inside of an image file, that requires a password to extract the original bits. If we look at how the encoder embedded bits in the original post, we’ll see that it “jumps” one pixel at the time on the x-axis and warps on the y-axis when needed. If we change these “jumps”, we can change which bits get selected in a reproducible manner. Great!

However, there’s a fatal pitfall that we must avoid to implement this idea – the jumps must not repeat the same pixels over and over again. It is absolutely vital that it selects a unique pixel every time. We can actually look at these jumps as multiples of an element in a group. An element m of the group Z_n / Z is a generator of the group, if it’s coprime to n (i.e. gcd(n, m) = 1). To accomplish this, we can just select the jump interval to be a prime (note, this assumption is wrong), then we’ll make sure never to repeat the same pixel over and over again. Fantastic!

For the key, we’ll take as input any string. The key (which needs to be an integer) will be the hashcode of the string modulus the size of the group (for an image of size w \cdot h the size of the group is w \cdot h). Then we’ll find the closest prime using a simple primality test, and we’re good to go.

barack barack_2

Hint: The password is ‘obama’

The only weakness as far as I see it, is that the program starts encoding at (0, 0) every time, but I’ll get around to changing that eventually.

Embedding hidden messages in plain text


The source code for this project can be found at http://pastebin.com/X08bmjWh

This post will showcase a method for disguising secret messages in plain text using the unicode alphabet.

Some unicode characters are called white space characters, and are a collection of characters reserved for “space” between other characters, be it horizontal space or vertical space. However, there is one white space character, which is called ZERO_WIDTH_SPACE, and is literally a character without width – it is invisible. You can literally have hundreds of these characters embedded in a string of text, without anyone being the wiser. How can we use this to transmit secret informations to our allies?

A message is composed of bits, which can be either 0 or 1. This usually means on/off (at least for engineers), but in our case, it’ll denote the presence or absence of a ZERO_WIDTH_SPACE-character. We’ll let the vertical bar represent a bit in the following example, to illustrate how to program would work:

|H|E|Y| |Y|O|U| |T|H|E|R|E|

Each vertical bar represents a single bit – it can either be a ZERO_WIDTH_SPACE-character, or it can be nothing. This way we can transmit hidden messages without our enemies reading it (I’m looking at you, NSA). Given a string of n characters, this means we have n+1 bits available at our disposal. We’ll use the same encoding as our steganography post and use 5 bits per character*. This means we’d be able to encode a 28-character message into the length of a Twitter post (if we suppose Twitter doesn’t count unicode characters – which it does).

*We have moved all the characters up one bit (i.e. A is moved from 0 to 1, B is moved from 1 to 2 etc.) to give space to an exit character. This way the program knows when the message is done. Otherwise, it’d count a bunch of zeroes at the end of the post, giving us a lot of A’s at the end of every cipher.

Here’s a little practical example:

(1) NSA is a lawful and just organisation. They just want to help us.
(2) NSA is a lawful and just organisation. They just want to help us.

Can you tell which message has a cipher text encoded into it? No, me neither. (sidenote: WordPress doesn’t handle Unicode apparently, so the above the sentences are exactly the same)

Future projects: It’d be beneficial to encode consecutive bits of 1 into the same gap between two characters. This’d allow more data transmitted over fewer characters.

Steganography and PNG’s


Source code for this project can be found at: http://pastebin.com/GiSavbfz

Steganography is the act of hiding a message in such a way, that only the intended recipient is able to retrieve it, often in plain sight. Note, this is not  the same as cryptography – cryptography is a completely different discipline. Sidenote, if you actually intend to send secret messages to someone else, please, please do not use steganography. Steganography is fun and all, but it’s not a reliable secure way to transmit secret information.

An image is composed of a bunch of RGB codes for every pixel. Each color code has a corresponding integer value associated with it (between 0 and 255), and integers are composed of bits. This means that every pixel has three 8-bit binary codes associated with it, giving the pixel its color. However, not all bits in the binary code are of equal importance. Altering the final two bits of say the red color code will hardly be noticeable for us humans. How can we use this to transmit secret information?

A message is, just like the color codes, composed of bits. And because altering the least significant bits in the color codes won’t make a noticeable difference, we can encode a message into the least significant bits of the color codes. Clever, eh?

In our implementation, we choose a 5-bit alphabet for encoding our messages. We won’t need anymore if we’re sending top-secret information to our pals – styling and unicode is not of importance. We’re choosing the letters A-Z (0-25), space, and basic punctuation (dots, commas, exclamation mark, question mark and apostrophe), for a total of 32 characters, which fits neatly inside 5 bits.

The human eye is best able to detect changes in contrast in the green channel, so we’ll take two bits from the red channel, two bits from the blue channel, and one bit from the green channel for each pixel. This means we can encode one character into one pixel on the image, which allows us to transmit pretty huge messages in just a single image.

00000000
00000000
00000000

The black bits our ours, the rest are reserved for the pixel. We went ahead and encoded a message onto on of these images of Barack Obama:

obama_yes obama_terror

Can you tell which is the fake Barack Obama?

You can find a link to the source code at the top of this post. Here’s how you use it:

//To encode a message
Encoder e = new Encoder("path-to-image.png");
e.encodeMessage("your-message-here");
e.saveImage("path-of-new-image.png"); //We advise against using lossy image formats, such as JPEG, as they'll ruin the hidden message.

//To retrieve a message
Encoder e = new Encoder("path-to-image.png");
String hiddenMessage = e.getMessage();