What tiny looking operator packs a mighty punch with large datasets? Your queries are big. Your reports span multiple columns. To top it all you have to summarize data in multiple ways. Are you tempted to throw in the towel? Do you feel like saying a firm “Can’t be done” to your team?
Hint of Advice: DON’T give up! Read on for a handy tip which might well be the most useful tip of 2012.
The problem
While mentoring recently on SAS, a banking customer asked, “I source data from several detailed tables. I want this report:"
The setup
What if you also need insights on client behavior? You can easily sift through large data with the slick Boolean. Basically the digital computer breaks down everything into Binary, 0s and 1s. Boolean Logic is the heart and core of all programming. It works with simple YES and NO responses. A YES is a value of 1 and a NO is a value of 0. You’ll see how you can create any query combination effortlessly. Once you understand Boolean logic you’ll realize that the best part of it is how surprisingly simple it is!
Without further ado, here’s my code joining data from two tables and summarizing several possible combinations of employee demographics to group report by country.
The Code
proc sql; select country, sum(salary < 60000 and employee_gender='M') 'Males and Salary < 60000 ', sum(salary >= 60000 and employee_gender='M') 'Males and Salary >=60000 ', sum(salary < 60000 and employee_gender='F') 'Females and Salary < 60000', sum(salary >= 60000 and employee_gender='F') 'Females and Salary >=60000' from sales as s, orion.employee_addresses as p where s.employee_id=p.employee_id group by country ; quit; |
The Extension
Extend Boolean Logic to any industry data. Below see a report I created using diabetes data where I’m trying to narrow in on high risk individuals, using a combination of bmi (body mass index), age and glucose levels as filters. If you’re curious about the code and data, pop me a note. I’m happy to send it to you. (FYI - this was a great study on diabetes incidence among the Pima Population in Arizona that I presented at a health user group.)
At its very basic, I would break up Boolean logic into two pieces:
1) Predicate or Key concepts- this is your filter. Examples: employee_gender=’M’, Salary > $60000.
2) Logical Operator or Glue – which joins predicates. These are the logical operators AND, OR, NOT.
It might help to remember that we use Boolean logic almost every day in our daily life. You answer questions with YES or NO responses. Weather check on smartphone apps, calendars help track client appointments. With the help of these YES, or NO type answers we make decisions on what to wear!
To sum up, don’t let the big word BOOLEAN daze you. All I did was take advantage of its capacity to combine multiple conditions for sifting through customer data to get as much intelligence as my heart desires. Wonder why I picked this as my #1 tip? I've used many SAS tips to my advantage. But this one stands out in my mind in its universality of use that I simply had to share with you. Have you used Boolean logic? Do you have a better solution for my customer problem? Write to me. I’d love to hear your experiences.
Shout out to Andrew Macdonald whose question prompted me to write this blog post. And to my colleague Dr. David Yeo with whom I’ve had seemingly endless discussions on Boolean logic.
To learn more, consider these SAS training courses: SAS Programming 1: Essentials, SAS SQL 1: Essentials, Text Analytics Using SAS Text Miner.
8 Comments
Nice example. Would you email me the code and data? Thank you.
I certainly can Jerry, can you send me your email id. thanks!
What if I want to do this calclation based on age. For example if a patient saw the Dr, on April 3, 2011 (date of service) for service and want to make sure he is 19 yrs old (birth_dt) t the time he recieve the service being provided....how would I calculate that using sas.
Certainly, data validation is a SAS strength. here is code that will test whether any of your patients accidentally received treatment before they turned 19..I tweaked the code above to use the INTCK function to find patient age as of serviced date. and used the WHERE clause to filter anyone younger than 19..
proc sql;
select service_date 'Date of Service', intck('year',birth_date,service_date) as age from
boolean.patientdata as p, boolean.servicedate as s
where p.patient_id=s.patient_id
and calculated age < 19
;
Thanks for this post Charu. This is perfect timing because I am just at the point where I need to group my data into a few buckets. So far I've been getting quite comfortable with SQL, but is there a similar data step equivalent?
Aha. when you start talking buckets there are a lot of way to tackle that in the datastep.. For example arrays. I'll teach you how to create & use arrays in our PRG2 class. Andrew, glad you're using all the learned concepts so quickly!
Hi,
I agree, boolean logic (and doing math with booleans) is very nifty.
You asked about alternative solutions.
My first thought would be to use PROC TABULATE, and use formats to create the groupings for gender and salary.
--Q.
Agreed! Boolean is amazing.. proc tabulate is a great tool Quentin..do you think we could do the same report with tabulate alone? I'm afraid I'm not terribly good at this proc.. are you able to write me some code that I can steal.. thanks for your comment.