This blog serves two purposes: the main purpose is to show you some useful SAS coding techniques, and the second is to show you an interesting method of creating a Beale cipher.
TJ Beale is famous in Virginia for leaving behind three ciphers, supposedly describing the location of hidden gold and treasures. (Most cryptologists and historians believe the whole set of ciphers and treasure was a hoax.) In one of the ciphers, he used a method based on the Declaration of Independence. His coding method was as follows:
- Get a copy of the Declaration of Independence and number each word.
- Take the first letter of each word and form a list.
- Associate each number with that letter.
For example, consider this text:
“Four score and seven years ago, our fathers brought forth upon this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal. “
To create a Beale cipher, you would proceed as follows:
Four(1) score(2) and(3) seven(4) years(5) ago(6), our(7) fathers(8) brought(9) forth(10) upon(11) this(12) continent(13) a(14) new(15) nation(16), conceived(17) in(18) liberty(19) and(20) dedicated(21) to(22) the(23) proposition(24) that(25) all(26) men(27) are(28) created(29) equal(30).
Next, you would make a table like this:
Letter | Numbers |
F | 1,8,10 (all the numbers of words that begin with 'F') |
S | 2,4 |
A | 3,6,14,20,26,28 |
Y | 5 |
…and so on |
You would then want to put the list in alphabetical order like this:
A | 3,6,14,20,26,28 |
B | 9 |
C | 13,17,29 |
D | 21 |
E | 30 |
F | 1,8,10 |
…and so on |
To create your cipher, select any number at random from the list of numbers, corresponding to the letter that you want to encode. The advantage of this over a simple substitution cipher is that you cannot use frequency analysis to guess what letter a particular number represents.
This blog explains how to create a Beale cipher; my next blog will explain how to decipher a Beale cipher.
You need to start out with a book or document that is accessible to the sender and recipient of the cipher. To offer some additional security, you could decide to start from a specific page in a book. For a simple demonstration of how to create a Beale cipher, I have entered part of the Declaration of Independence in a text file called Declare.txt.
A funny aside: I was teaching my Functions course in the UK, in a small town north of London on the Thames. One of the programs demonstrating several SAS character functions was the program I'm using here to demonstrate how to create a Beale cipher. I had completely forgotten that the document was the Declaration of Independence. Whoops! I asked the class, "I hope you're not still angry with us about that." Apparently not, and we all had a good laugh.
Back to the problem. I will break down the program into small steps and provide a partial listing of data sets along the way, so that you can see exactly how the program works. The first step is read the text file, extract the first letter from each word, change the letter to uppercase, and associate each letter with the count of words in the text.
Here is the first part of the program.
data Beale; length Letter $ 1; infile 'c:\Books\Blogs\Declare.txt'; input Letter : $upcase1. @@; ❶ N + 1; ❷ output; run; title "First Step in the Beale Cipher (first 10 observations)"; proc print data=Beale(obs=10) noobs; run; |
❶ By using the $UPCASE1. informat, you are selecting the first letter of each word and converting it to uppercase. If you are unfamiliar with the $UPCASEn. informat, it is similar to the $n. informat with the additional task of converting the character(s) to uppercase.
❷ You use a SUM statement to associate each letter with the word count.
Here is the listing from this first step:
Next, you need to sort the data set by Letter so that all the words that start with As, Bs, and so forth are placed together.
proc sort data=Beale; by Letter; run; title "The list in sorted order (partial listing)"; proc print data=Beale(obs=10) noobs; run; |
Below is a partial listing of the sorted file:
Any of the numbers 24, 25, 27, and so forth can be used to code an 'A'.
The final step is to list all the letters from A to Z (Z is pronounced Zed in the UK and Canada) in a line, followed by all the possible numbers associated with each letter.
data Next; length List $ 40; ❸ retain List; ❹ set Beale; by Letter; ❺ if first.Letter then List = ' '; ❻ List = catx(',',List,N); ❼ if last.Letter then output; ❽ run; title "List of Beale Substitutions"; proc print data=next(obs=5) noobs; var Letter List; run; |
❸ The variable List will hold all the possible numbers that can be used to code any of the letters. In a real program, this list might be longer.
❹ You need to RETAIN this variable; otherwise, it would be set back to a missing value for each iteration of the DATA step.
❺ Following the SET statement with a BY statement creates the two temporary variables, First.Letter and Last.Letter. First.Letter is true when you are reading the first observation for each letter—Last.Letter is true when you are reading the last observation for a letter.
❻ For the first A, B, C, and so on, initialize the variable List to a missing value.
❼ Use the CATX function to concatenate all the numbers, separated by commas.
❽ When you are done reading the last A, B, C, and so on, output the string.
Below are a few lines generated by this program:
For more information about the CATX function and other SAS functions, please take a look at my book, SAS Functions by Example, Second Edition.
1 Comment
There are several versions of the Declaration of Independence online. If you would like to run the program described in this blog and want to obtain the same results as mine, I would be happy to send you a copy of the Declare.txt file.