Why statisticians need strong communication skills

PeopleWorkingThis year the American Statistical Association Conference on Statistical Practice (CSP) has some weighty themes including Big Data Prediction and Analysis and all of its exciting applications. But just as important is the theme Communication and Impact. Everyone knows that if you have a great idea or discovery but you can’t communicate it adequately, it is not going to have much impact!

Think of the doctor, Ignaz Philipp Semmelweis, who back in 1847 showed the connection between hand washing and patient survival rates. His observations conflicted with the established scientific and medical opinions of the time and his ideas were rejected by the medical community. He failed to communicate his discovery well and ended up in a mental institution for his troubles. It wasn’t until the 20th century that hygiene became standard practice in medicine.

SAS Press authors Roger Hoerl and Ronald Snee have put together an excellent course to enhance leadership skills and to help statisticians transition from being viewed as passive consultants to being viewed as proactive leaders within their organizations. SC2 From Statistical Consultant to Effective Leader explores how to communicate discovery and make an impact as a leader.

Come and visit the SAS Press booth to browse their recent title, Statistical Thinking: Improving Business Performance, and others in our collection of titles covering diverse topics from big data analytics, basic statistics, data reduction and management and text mining. Perhaps there is a topic you would like to see covered but we don’t have? Perhaps you have a topic you would love to write about?

While we do have some great titles, I know we haven’t covered everything. Please stop by and have a quick chat with me. I’d be happy to discuss what we do have available, what we would like to see and what you would like to see. After all, writing a book is a great way to start making an impact as a leader!

Post a Comment

Are you doing things SAS backwards?

SAS Programming Professionals,

Are you doing things SAS backwards?

Yea; we have all been there.  A manager, or a user, or a client, asks you to perform an analysis or create a data set and gives you a sketchy description of what they want. Maybe they say "It's like that report you did last November", or they send a few descriptive lines in an email, or they verbally describe it in a meeting, or they sketch out some ideas on a notepad.  So, you dutifully write a program that processes what you think is the relevant data and creates the result set you believe they want. You QC the results, present it to the requester and... you are told that it is not right.  So, you modify your program according to their feedback and create the next result set.  And, it is still not right.  You find yourself repeating the process again and again until your manager or user or client is finally satisfied.

What is wrong with this process is that you are starting to write a SAS program when you do not have solid specifications.  Not understanding exactly what the requester wants invariably leads to wasted programming time.  So, you need to get your user to be specific about what, exactly, s/he wants before you start programming.  At a minimum, the requester should define:

Input

  • The data sources you are to extract the data from
  • The selection criteria you are to use to subset the data
  • The variables they want in the final data set or report

Process

  • What types of analysis they want performed on the data
  • The formulas for any new variables that are to be created
  • Any transformations that are to be made to existing variables
  • What to do with missing values for important variables
  • Whether any variables should be suppressed from the final result set

Output

  • For data sets, the type of data set and the variable order
  • For reports, the layout of the report, including variable order, headings, titles, footnotes, and summary columns
  • For graphs, the type of graph, variables to be graphed, titles, legends, footnotes, and reference lines
  • The type of additional documentation needed, such as a PROC CONTENTS or frequencies of the delivery data set

The best thing you can do is to have your user send you the specifications in writing.  That performs two functions:  It makes the person stop and really think out what s/he wants, and it gives you a solid starting point.  From there, you can ask questions to fill in any of the details that were not covered in the original, written request.  After a few iterations, you should have a solid spec to work from.

If you only receive verbal directions or sketchy notes via email and are not likely to get anything more solid, you may have to write the specs yourself.  In that case, you would write down the details of the Input, Process, and Output steps that you believe your client wants and send them to the client.  Let your client review the specs and verify that they are what s/he wants, or suggest changes to your proposal that better define what is needed.  Once the email exchange culminates in fully ratified specs, you are good-to-go with starting the SAS programming.

Almost invariably, any programming effort--no matter how well spec-ed out--will require an iteration or two to satisfy a user's requirements.  However, you can drastically reduce the time you spend programming by nailing-down your user's needs as much as possible before you begin to write your SAS program.  Few things in life are more satisfying than creating a tight, elegant, efficient SAS program that conforms to specifications and that produces the requested result set.

Remember; without specifications to refer to, you cannot judge whether or not the program you wrote and the result sets you created conform to the requirements.  And, as Lewis Carol wrote:  "If you do not know where you are going, any road will lead you there."

Be sure to check out Michael Raithel's latest book How to Become a Top SAS Programmer.

Post a Comment

Five tips from Simulating Data with SAS

simulatingdataData simulation is a fundamental technique in statistical programming and research. My book Simulating Data with SAS is an accessible how-to book that describes the most useful algorithms and the best programming techniques for efficient data simulation in SAS. Here are five lessons you can learn by reading it:

  • Learn strategies for designing a simulation study. What is the best what to vary parameters in a simulation study? A poorly designed study can result in many wasted hours of computation. This book provides general advice and illustrates it with specific programs.
  • Learn to use elementary probability distributions to simulate from more complex distributions. SAS software provides an easy way to simulate data from about 30 standard probability distributions, but what if you need a more complicated model? The book shows how to use the standard distributions to simulate data from complex distributions.
  • Learn how to simulate data that follow a regression model. The book shows you how to simulate response variables for general linear regression models, generalized linear models, mixed models, longitudinal models, time series models, and more.
  • Learn efficient techniques that save time. Yes, simulation is a brute-force statistical method, but efficiency matters. The book describes tips and techniques that can decrease the computational time a hundredfold. If your simulation currently requires hours, you might be able to rewrite it to run in seconds.
  • Learn to use SAS/IML software for simulations that involve multivariate data. Using the right tool makes completing a task faster and simpler. Although it is possible to turn screws with a pair of pliers, that tool is not optimal for that task. Similarly, although the DATA step is a powerful tool for some simulation tasks, the SAS/IML matrix language is a better tool for complex simulations and for those that involve multivariate data. I introduce the SAS/IML language gradually and show you how to complete many tasks by using either tool.

Simulation is a powerful statistical technique. Whether you are a novice or an expert, I hope that you will find many tips in Simulating Data with SAS that enable you to write simulations that are both effective and efficient.

Post a Comment

SAS author's tip: Identifying FIRST.row and LAST.row in by-groups

A popular DATA step programming technique frequently used by SAS users is the ability to identify the beginning and ending observation in by-groups. The way it works is whenever a BY statement is specified with a SET statement, two temporary variables are created in the Program Data Vector (PDV), FIRST.variable and LAST.variable, for each variable specified in a BY statement. These temporary variables contain a value of either 0 or 1, as shown below:

  • variable = 0 indicates an observation is not the first observation in a BY group
  • variable = 1 indicates an observation is the first observation in a BY group
  • variable = 0 indicates an observation is not the last observation in a BY group
  • variable = 1 indicates an observation is the last observation in a BY group

Although FIRST.variable and LAST.variable temporary variables are not automatically written to a SAS data set, once identified they can be used for special processing, reporting purposes, the creation of new variables, modification of existing variables, the structural change of a SAS data set, and more.

To illustrate this stalwart DATA step technique using the SQL procedure, a single table will be used, as shown below. The Movies table consists of six columns: Title, Category, Studio, and Rating are defined as character while Length and Year are defined as numeric.

MOVIES Table

MOVIES_Table

As a longtime SAS user, I wanted a programming technique that could be used by PROC SQL users to emulate the DATA step’s FIRST., LAST., and By-group capabilities. Another important consideration was to have the technique operate similarly across operating systems and, if possible, with other vendor’s RDBMS SQL. The resulting technique uses an SQL subquery to identify the FIRST.row and LAST.row in each By-group.

Identifying FIRST.row in by-groups

The FIRST.row technique, shown below, is constructed using an SQL subquery to identify the first, or beginning, row in each by-group. The process begins by producing a new table that contains the desired by-group order, physically sorting the rows in the MOVIES table in ascending order by the primary variable, RATING, and then in ascending order by the secondary variable, TITLE. Note: The outer query serves as a control group to ensure that processing takes place in the desired by-group order. Next, the MOVIES table is processed to evaluate the results of the MIN function specified in the inner most query (or subquery). The MIN function’s purpose is to capture the smallest (or minimum) value (for numeric variables), or minimum letter (for character variables) and use this value in the WHERE clause comparison. The previous operations are then repeated for each unique by-group in the MOVIES table.

/*************************************************************************/
/** ROUTINE.....: FIRST-BY-GROUP-ROW                                    **/
/** PURPOSE.....: Identify the first (min) row in each by-group using a **/
/**               subquery.                                             **/
/*************************************************************************/
proc sql;
  create table first_bygroup_row as
    select rating,
           title,
           'FirstRow' as ByGroup
      from movies M1
        where title =
               (select MIN(title)
                  from movies M2
                    where M1.rating = M2.rating)
          order by rating, title;

FIRST.row Results

Table_firstrow

Identifying LAST.row in by-groups

The LAST.row technique, shown below, is constructed using an SQL subquery, with its purpose to identify the last, or ending, row in each by-group. The process begins by producing a new table that contains the desired by-groups, physically sorting the rows in the MOVIES table in ascending order by the primary variable, RATING, and then in ascending order by the secondary variable, TITLE. Note: The outer query serves as a control group to ensure that processing takes place in the desired by-group order. Next, the MOVIES table is processed to evaluate the results of the MAX function specified in the inner most query (or subquery). The MAX function’s purpose is to capture the maximum value (for numeric variables), or maximum letter (for character variables) and use this value in the WHERE clause comparison. The previous operations are then repeated for each unique by-group in the MOVIES table.

/************************************************************************/
/** ROUTINE.....: LAST-BY-GROUP-ROW                                    **/
/** PURPOSE.....: Identify the last (max) row in each by-group using a **/
/**               subquery.                                            **/
/************************************************************************/
  create table last_bygroup_row as
    select rating,
           title,
           'LastRow' as ByGroup
      from movies M1
        where title =
               (select MAX(title)
                  from movies M2
                    where M1.rating = M2.rating)
          order by rating, title;
quit;

LAST.row Results

Table_lastrow

Further Considerations

The collating sequence for character variables under ASCII (UNIX, Windows, and OpenVMS) and EBCDIC (z/OS) are different. The default ASCII collating sequence sorts special characters and digits first, followed by uppercase letters, and finally lowercase letters. In contrast, the default EBCDIC collating sequence sorts special characters first, followed by lowercase letters, then uppercase letters, and digits last. Due to the encoding differences, users are encouraged to consult additional references on the topic of sort order and collating sequences.

Users may also want to consider using multi-threaded sorting for reducing the real time to completion for sort operations. New with SAS System 9, the SAS System option, THREADS, takes advantage of multi-threaded sorting which may be able to reduce CPU resources and costs.

For more information about the SQL procedure, check out Kirk Paul Lafler’s book, PROC SQL: Beyond the Basics Using SAS, Second Edition.

Post a Comment

Resolve to improve your SAS programming skills

Happy New Year to you, your families, and your colleagues!

This is the time of year when people make all types of New Year's resolutions. According to the USA.Gov web site, the most popular resolutions are:

  1. Lose Weight
  2. Volunteer to Help Others
  3. Quit Smoking
  4. Get a Better Education
  5. Get a Better Job
  6. Save Money
  7. Get Fit
  8. Eat Healthy Food
  9. Manage Stress
  10. Manage Debt
  11. Take a Trip
  12. Reduce, Reuse, and Recycle
  13. Drink Less Alcohol

And, these resolutions all make sense because they promote personal growth. It is a new year, so why not start making the changes that you want to carry forward throughout the year?

Along with personal changes, perhaps it is also time to consider what changes you could make to improve your SAS programming skills. There is no "most popular list of SAS resolutions" for us to reference, so I will suggest some:

  1. Lose unnecessary variables and observations when inputting data sets
  2. Volunteer to help junior SAS programmers
  3. Quit writing programs without comments in them
  4. Get a better understanding of the intricacies of PROC SQL
  5. Get more sophisticated in your SAS Macro programming
  6. Save time by re-purposing your old programs
  7. Purchase some interesting SAS Press books... and read them
  8. Join a local SAS users group and attend meetings
  9. Get better at using the Output Delivery System
  10. Take a SAS class
  11. Write a technical paper for SAS Global Forum
  12. Reduce processing time by writing more efficient programs
  13. Subscribe to the SAS communities discussion groups on support.sas.com

Hardly a comprehensive list, but definitely a good start, no? And, I would bet that one or more of these resolutions resonate with you; that they are things you have been wanting to do, but have not quite found the time to work on in 2014. Well, there are 360 more days of 2015 just waiting for you to dig into your own SAS resolutions.

Of course, you could simply sit back and do the same old, same old. But, how could you continue to achieve personal growth as a SAS programming professional if you did that?

Meet the new year; same as the old year--Oh, we really think not!

Post a Comment

5 best-selling SAS Press books of 2014

SASbooks3

SAS Press warehouse

When it comes to exchanging gifts at Christmas, my brother and I have a tradition of buying each other books. Even when he lived away in Chicago, and everyone else in the world was buying eBooks, I could always count on a big, heavy box delivered to my house each year filled with new authors and titles he thought I’d enjoy.

If you’re buying for a book lover, who also uses SAS, our list of the best-selling books this year might be a helpful shopping guide.

Top 5 Books

  1. The Little SAS Book: A Primer, Fifth Edition
  2. Big Data Analytics: Turning Big Data into Big Money
  3. SAS Certification Prep Guide: Base Programming for SAS 9, Third Edition
  4. SAS For Dummies, Second Edition
  5. Learning SAS by Example: A Programmer’s Guide

We also have a holiday spectacular sale going on right now with 75 percent off select titles.

The best part about giving someone a book is hunting for that special one that you think they’ll enjoy. I stumbled upon the perfect book for my brother months ago and it’s something he’ll never guess. I can’t wait until he opens it!

Do you have a similar book tradition with family or friends? Please share in the comments section.

Post a Comment

The uncertain future

In the long run we’re unemployable. Not because we’re lazy or incompetent. It’s because we’re replaceable.

I’ve spent much of the eighteen months trying to impress on people how real this is. It’s not something that’ll happen “one day”. It’s happening now, everywhere we look. Once upon a time, I worked at General Motors as a futurist, trying to peel back the veil of a clouded future. Sadly, I probably ended being more of a Cassandra, lucky enough to see the future but cursed with never being believed.

Still, I had a great time. Increasingly though, I find myself being a “presentist”, trying to wake people to the start of the disruption we’re living through.

Every time we use an algorithm to predict something, every time we automate a low-value process, we take another step towards the future we’re creating whether we realise it or not. A future where the machines we’ve created are better, faster, and more effective than us at using data to make “good enough” decisions.

We’re still decades (if not centuries) from being able to model the human brain. The thing is though, does it even matter? Machines don’t have to be self-aware to be effective. Google’s cars don’t work because they’re smarter than us. We’re just really good at teaching them to analyse and act on data in real-time. And, that’s all it takes; the disruption we’re creating isn’t because some genius behind the scenes has a grand vision. It’s the organic result of lots of people trying to make lots of small things more efficient.

It creates an interesting question. What does a world look like where over half of of the workforce becomes structurally unemployed through no fault of their own and yet productivity continues to rise? What happens if [Okun’s law](en.wikipedia.org/wiki/Okun's_law) permanently breaks down?

About the only thing that’s certain is that we’ve never seen anything like this before. Much like the move from feudalism to capitalism, “where we’re going we don’t need roads”. From here on, it’s all uncharted territory. And for all we know, here be monsters.

It’s very possible that we’ll enter a cultural renaissance, one where the nature of social security and capitalism is fundamentally disrupted. Society may split in three, those with the skills necessary to design and automate, those whose skills are no longer needed, and those whose skills are so manual and bespoke that they cannot be automated.

Those whose skills are still in demand might have the freedom to ask and receive a margin for their knowledge. With no changes to productivity or output, those who are structurally unemployable might still receive a basic wage supported through higher taxes on those whose skills are in demand. In exchange, they receive the freedom to chase cultural or creative activities, supporting the economy through their ongoing consumption. And in doing so we might move to a new equilibrium, one characterised by greater employment freedom, more free time, and a Cambrian-like explosion of cultural creativity.

Or, it might be very different. We might equally well see increasing income inequality as those whose skills are no longer relevant are increasingly squeezed out of the market. Not everyone has the skill or experience to become a data scientist, and engineer, or a bio-technician and help build our future. And, while everyone still needs a toilet, we only need so many plumbers.

These unfortunate souls may see themselves become marginalised from society, structurally unemployable with no avenue for improvement. The relationship between youth unemployment civil unrest is well known. Even today, youth only makes up 17% of the population and yet comprise 40% of the unemployed. What happens if this becomes not just a blip in our employment data but the start of an inter-generational trend?

Our future is uncertain; much depends on whether you see the glass as being half-full or half-empty. It’s also going to be what we choose to make it.

Evan Stubbs is the author of several SAS books including his latest title, "Big Data, Big Innovation."

Post a Comment

Beyond traditional forecasting

David Corliss of Ford Motor Co. is currently writing a book about clustering methods in time series analysis.

I caught up with him at the Analytics 2014 conference in Las Vegas last month (where he was also presenting on this topic) to talk about these methods and why he decided to write about it.

We kicked off the interview with a question he gets a lot - how do you prepare your data?

Take a look…

 

His book is scheduled to be available late 2015.

Post a Comment

Fulfilling goals

We all have goals…personal, professional, stretch goals, goals that will move us forward quickly. I’ve talked to many people who have a goal of someday writing a book. They always say “someday” in a wistful tone, because they never feel like they’ll have enough time or knowledge to be an author. If you’re one of those folks, I’m here to tell you that the time is now. I can help you become an author now…not someday.

We have two months left in 2014. Two months to fulfill your dream…because it is possible. Check out our web site and read up on what it takes to submit a proposal. Then contact us with your idea. We’ll chat with you about your idea and the steps you can take toward your book proposal submission.

Become an author in 2014? With SAS Press’ help, you can check this one off the list!

Let SAS Press help you get your words in print.

Let SAS Press help you get your words in print.

Post a Comment

The innovation vampire

Innovation is easy. All you need to do is three things. 3100236400_33a70b80c7_m

First, have a great idea.

Go on, I’ll wait.

Next, make it happen. And, once that’s done, just make sure you work out how to commercialise it somehow.

Innovation’s never simple, even at the best of times. Moving from the big idea to making it happen is full of those horribly detailed problems that need solving, even if they’re not always obvious. Out of everything that doesn’t happen but should, there’s one thing that sucks the life out of ingenuity more than anything else.

Before you can innovate, you first need to work out what the heck it is you’re trying to do. Making things better only happens when everyone first agrees what “better” means.

It’s a tricky word. Sometimes, it’s managing the presence of something you need to fear. Other times, it’s the absence of something you should be terrified of. Monsters come from the most unlikely of places. Inside this space lurks your innovation vampire.

Seems goofy, right? Left alone though, it’ll eat the core out of your ability to innovate. Like their namesake, innovation vampires slowly drain the life out of your business. They absorb creativity, cloud your mind, and if left long enough, eventually kill you.

They’re also extremely good at hiding in plain sight. Even though they’re easy to see if you decide to look for them, you first need to recognise you might have an even bigger problem than you originally thought. It might not be your ability to be creative. It might just be that your problems are starting a little closer to home.

When people can’t agree on what’s important, you prevent innovation. When people can’t hold the course, you prevent commercialisation. Without both of these, you stagnate. The worst thing is that you’ll only find these problems if you actively look for them. We’re dangerously good at self-delusion. Think on this - what’s the gap between your organisation’s:

  • Stated values and actual behaviour? Everyone wants to be customer-centric. When there’s a problem, what do people do? Do they actually represent your stated values or do they still practice the behaviours you’re trying to discourage?
  • Intent and execution? The best-laid plans rarely survive first contact with the enemy. When your troops are faced with an unexpected roadblock, how do they behave? Do they mirror your mission statement or do they compromise?

Talk is cheap. Change is hard. Facing this beast head-on isn’t fun - it takes some serious self-reflection and often ends in disruption from the top down. It’s also not an easy fix; it starts and ends with culture.

But, I’m sure you already know all this. And, everything’s under control, right? So, for everyone who thinks that they don’t have any problems with an innovative vampire, here’s one last test. Seeing as you’re reading this, it’s a fair assumption that you’re probably fairly familiar with using data to make better decisions. In fact, you’re probably a big champion for data-driven decision-making.

Now, ask yourself this: every day, how many of your decisions are based on data rather than what’s in your gut? Do you walk the path you promote? Or, do you unconsciously show the behaviours that you’re trying to eliminate?

Culture starts with you. It’s a small thing, but it’s the small things that eventually have a big impact.

 Evan Stubbs is the author of several SAS books including his latest title, "Big Data, Big Innovation."

photo by Joriel “Joz” Jimenez //attribution by creative commons

Post a Comment