In my new book, The Enigma Terrorists, the protagonist, Dr. Ralph Nagy, published a book on SAS functions and, to demonstrate the binary exclusive OR function (BXOR), he wrote programs to encode and decode text. In this blog post, I’ll show you how this works. (The code was actually included in the chapter on Binary functions in my book SAS Functions by Example, 2nd edition.)
Most everyone is familiar with the Boolean OR operator. It returns a TRUE value if either argument is true. It also returns a TRUE value when both arguments are true. An exclusive OR is like the OR operator, except it returns a value of FALSE if both arguments are TRUE. To summarize:
Why is this useful in encoding and decoding text? It turns out that if you perform an exclusive OR to encode text with a key, if you then perform another exclusive OR with the coded text and the key, you get back the original text. The following SAS program demonstrates this:
*Program to demonstrate how the exclusive OR can be used to encode and decode text; data Cipher; Text = rank('A'); Key = rank('B'); Code = bxor(Text, Key); Decode = bxor(Code, Key); run; Title "Listing of Data Set Cipher"; proc print data=Cipher noobs; format Text Key Code Decode Binary8.; run; |
The RANK function returns the ASCII value of its argument. In this example, it returns the value 65 when the argument is an ‘A’ (01000001 in binary) and a value of 66 (010000010) when the argument is a ‘B’. The BXOR function takes two arguments and performs an exclusive OR.
Here is the output:
Notice that the value of Decode is the same as Text.
If you are interested, here are two SAS macros that you can use to encode and decode data:
%macro encode(Dsn=, /* Name of the SAS data set to hold the encrypted message */ File_name=, /* The name of the raw data file that holds the plain text */ Key= /* A number of your choice which will be the seed for the random number generator. A large number is preferable */ ); %let len = 150; data &dsn; array l[&len] $ 1 _temporary_; /* each element holds a character of plain text */ array num[&len] _temporary_; /* a numerical equivalent for each letter */ array xor[&len]; /* the coded value of each letter */ retain key &key; infile "&file_name" pad; input string $char&len..; do i = 1 to dim(l); l[i] = substr(string,i,1); num[i] = rank(l[i]); xor[i] = bxor(num[i],ranuni(key)); end; keep xor1-xor&len; run; %mend encode; %macro decode(Dsn=, /* Name of the SAS data set to hold the encrypted message */ Key= /* A number that must match the key of the enciphered message */ ); %let Len = 150; data decode; array l[&Len] $ 1 _temporary_; array num[&Len] _temporary_; array xor[&Len]; retain Key &Key; length String $ &Len; set &Dsn; do i = 1 to dim(l); num[i] = bxor(xor[i],ranuni(Key)); l[i] = byte(num[i]); substr(String,i,1) = l[i]; end; drop i; run; title "Decoding Output"; proc print data=decode noobs; var String; run; %mend decode; |
Here is an example of how to call these two macros:
%encode (Dsn=code, File_name=c:\books\functions\plaintext.txt, Key=17614353) %decode (Dsn=code, Key=17614353) |
Many of you have read one or more of my SAS and/or statistics books, but did you know I also write fiction? I recently published a fiction novel, The Enigma Terrorists, on Amazon.
The story centers around a college professor who is asked to help the NSA break a code to stop terrorists from blowing up nuclear reactors in France. Of course, the coding programs are written in SAS!
CHECK IT OUT | RON CODY'S AMAZON AUTHOR PAGE
4 Comments
Hi. Thanks for your comment. The purpose of the code was simply to demonstrate the binary operator XOR, not to create a universal encryption/decryption tool. I'm delighted that at least one person read and understood my blog.
Cute application. Thanks for sharing.
From a SAS coding perspective, neither l nor num need to be arrays. They can be scalar quantities.
Regarding the 'Key' value, your program contains the comment, "A large number is preferable." I'm not sure why that comment is there. A small seed such as Key=1 produces a stream of values that is just as useful as the stream for Key=17614353. And a large key is only slightly safer in a brute-force attack where someone who knows your encoding process performs a brute-force approach to iterate over keys.
BTW, if your spy has a bad memory and cannot remember Key values, you can use a simpler stream such as
for the encoding and eliminate the Key parameter altogether. (Be sure to also use bxor(xor[i],i/&len) in the DECODE macro.)
Hi Rick. Nice to hear from you. I recall a SAS manual where it is suggested that large keys are preferable. But, you are the expert, so I'll take your word for it!
Very clever idea with the key. In reality, I'd probably use AES. I though of ways to make it harder to break the cipher by using different distribution and one of several different methods of generating the series, such as you discuss in several of your papers. By the way, did you get a chance to look at my "The Enigma Terrorists" book?
Cody,
It seems that your code does not work under NON-ASCII value.
E.X.
If I put some Chinese character like "美国美国" in "c:\books\functions\plaintext.txt" .
And I got garbles character after decoding .
The most simple way is re-encode it like:
data x;
x='America美国';
y=put(x,hex32.);
z=input(y,$hex32.);
run;
proc print;run;