Fun with Ciphers (Part 2)

2

In my previous blog, you saw how to create a Beale cipher. In this blog, you will see a program that can decode a Beale cipher. As a reminder, here is a list of numbers that you can use as a substitute for a letter when creating your cipher.

Now, suppose you want to send the following message: "Come to safe house at ten tonight." One possible cipher for this message is:

65 12 81 84 55 46 3 73 88 71 80 11 7
20 57 94 35 84 82 22 29 33 44 16 31 10
67 48 73 60

The first step to decode this cipher is the same as the first step in the program to create the cipher: Make a list of possible numbers to represent each letter. I'll repeat it here:

*Create the list of letters and numbers;
Data Decipher;
   length Letter $ 1; 
   infile 'c:\Books\Blogs\Declare.txt'; 
   input Letter : $upcase1. @@; 
   N + 1; 
   output;
run;
 
title "Listing of Data Set Decipher";
title2 "First Five Observations";
proc print data=Decipher(obs=5) noobs;
run;

This is the program that created the list of numbers corresponding to each letter. The next step in the program to create a Beale cipher was to sort by Letter. This time you want it in number order. Because it is already in order by the variable N, you don't have to sort it. Here are the first five observations in data set Decipher:

The next step is to read the message and make a SAS data set.

*Make a SAS data set from the Message text;
data Message;
   infile 'c:\books\Blogs\Cipher\Message.txt';
   input NN @@;
run;
 
title "First 5 Observations from Data Set Message";
proc print data=Message(obs=5) noobs;
run;

Here is the listing:

The final step is to create a temporary array (long enough to hold all the numbers). Each element in this array will contain the letter corresponding to the position in the array. The DATA step below first loads the temporary array elements with the appropriate letters and then reads each number from the file Message.txt (that contains the secret code). The temporary array is acting as a lookup table to find the letter corresponding to the number. I have annotated the program so that you can see exactly what is going on.

data Final;
   length Letter $ 1 String $ 200;
   array Letters[1000] $ _temporary_; ❶
   set Decipher (keep=Letter) end=Last_Obs; ❷
      N+1;
      Letters[N] = Letter; ❸
      if Last_Obs then do i = 1 to N_Message; ❹
      set Message Nobs=N_Message; ❺
      Letter = Letters[NN]; ❻
      String = catx(' ',String,Letter); ❼
      if i = N_Message then output; ❽
   end;
   keep String;
run;
 
title "Decoded message";
proc print data=Final noobs; ❾
run;

❶ Create a temporary array. Each element in the temporary array (Letters) is a letter corresponding the element number. For example, Letters[1] is 'W', Letters[2] is 'I', and so forth.

❷ Bring in the observations in data set Decipher. Each observation in this data set contains the first letter of each word in the document. The END= option lets you know when you have read the last observation in the Decipher data set.

❸ Load up the temporary array based on the values of N and Letter

❹ Once the temporary array is loaded, read in the observations in data set Message. Notice that the variable N_Message was set to the number of observations in data set Message at compile time by using the SET option NOBS=.

❺ Bring in the observations from data set Message.

❻ Decipher the number (NN) to determine the letter it represents.

❼ Use the CATX function to add all the letters to the variable String.

❽ After all the numbers from the file Message.txt have been processed, it is time to output an observation containing the variable String.

❾ Use PROC PRINT to print out the message.

Here is the output:

I showed this program to my friend Mark Jordan (aka SAS Jedi), and he came up with a solution that uses formats to do the table lookup. It is probably an easier and more elegant program than mine (his programs usually are), and I am including his program here.

The first step is once again to create the cipher. Make a list of possible numbers to represent each letter.  This time, though, we’ll create the Decipher data set so that it can be used to build a SAS format.

*Create the list of letter and numbers;
data Decipher;
   retain Fmtname 'Decipher' Type 'N'; ❶
   length LABEL $ 1;  ❷
   infile 'c:\Books\Blogs\Declare.txt'; 
   input Label : $upcase1. @@; 
   N + 1; 
   Start=N; ❸
   output; ❹
   drop N;
run;
 
title "Listing of Data Set Decipher";
title2 "First 5 Observations";
proc print data=Decipher(obs=5) noobs; ❺
run;

❶ FMTNAME and TYPE are required to be the same value for each observation. We accomplish that with a RETAIN statement.

❷ LABEL and START are the other two required variables for a PROC FORMAT control data set.

❸ Set Start to N.

❹ Write one row for each value we want to decode.

❺ Print the first 5 observations of the Decipher data set.

Here is the listing:

The next step is to create a format from the Decipher data set:

* Make a format from the Decipher data set;
proc format cntlin=Decipher fmtlib;
run;

The FMTLIB option produces a report documenting the format. Here is a sample:

The final step is to use the format on each number in the message text to decode it. The first DATA step below reads each number from the file Message.txt (that contains the secret code) to create the Message data set. The second DATA step reads the Message data set and applies the format to each numeric value using the PUT function. This produces the letter corresponding to the number. I have annotated the program so that you can see exactly what is going on.

*Make a SAS data set from the Message text;
data Message;
   infile 'c:\Books\Blogs\Cipher\Message.txt';
   input NN @@;
run;
 
data Final;
   length String $200;
   retain string;
   keep String;
   set Message end=last; ❶
   String = catx(' ',String,put(NN,decipher.)); ❷
   if last then output;
run;
 
title "Decoded message";
proc print data=Final noobs; ❸
run;

❶ Bring in the observations from the Message data set.
❷ Use the PUT function to produce the correct letter, and the CATX function to combine the letters into the variable String.
❸ Use PROC PRINT to print out the message.

I hope you enjoy both of these programs. Please add a comment to the blog with your preference. I think I'll vote for Mark's program!

Share

About Author

Ron Cody

Private Consultant

Dr. Ron Cody was a Professor of Biostatistics at the Rutgers Robert Wood Johnson Medical School in New Jersey for 26 years. During his tenure at the medical school, he taught biostatistics to medical students as well as students in the Rutgers School of Public Health. While on the faculty, he authored or co-authored over a hundred papers in scientific journals. His first book, Applied Statistics and the SAS Programming Language, was first published by Prentice Hall in 1985 and is now in its fifth edition. Since then, he has published over a dozen books on SAS programming and statistical analysis using SAS. His latest book, A Gentle Introduction to Statistics Using SAS Studio was published this year. Ron has presented numerous papers at SAS Global forums, regional conferences, as well as local user groups. He is presently a contract instructor for SAS Institute and continues to write books on SAS and statistical topics.

2 Comments

  1. Dimitrios Vatakis on

    Dr. Cody
    I purchased your text Applied Statistics and the SAS Programming Language for my stats class. Excellent text thank you especially for one who is learning SAS. However, you have provided a link with the data sets for additional practice. That link no longer functions or it is outdated. I reached out to Pearson the parent company of Savvas that bought Prentice Hall to see how I can get this info. They basically told me to go to Amazon and get the dataset, because I did not get the book through the Pearson site. The text is brand new, not used. Is there another source for the datasets?
    Thank you!

  2. Indeed, the implementation with the format might be the slickest, regarding the efficiency and all. BTW, the variable N in the 1st Decipher dataset turns out unused. If it is put to some use, the sum statement "N+1;" seems to be abled to be spared, along with re-declaration of "Letter $1". Thanks for sharing the program

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top