Anonymization for data managers

anonymization_1I’ve spent some time over the past couple of months learning more about anonymization.

This began with an interest in the technical methods used to protect sensitive personally-identifiable information in a SAS data warehouse and analytics platform we delivered for a customer. But I learned that anonymization has two rather different meanings; one in the context of data management and another in the context of data governance for reporting, sharing or publishing information.

I think both sides of the topic are interesting enough that it’s worth writing about them – I hope you will find both of them interesting too, as overall, this is a subject about which anyone who handles sensitive or personal data should know something.

To data managers, anonymization often means the technical process of obscuring the values in sensitive fields in the data, by replacing them with equivalent, but non-sensitive values which are still useful for e.g. joining tables, representing individuals in time-series or transactional etc. In some SAS products, eg SAS Federation Server, this is called ‘masking’. Read More »

Post a Comment

How sentimental of you! Enabling sentiment analysis in a SAS Visual Analytics word cloud

Word clouds have been available in SAS Visual Analytics for a while now, but recently, sentiment analysis was added to their functionality.

For those of you not familiar with word clouds, a word cloud, also known as a tag cloud, is a visual representation of text data. You are probably seeing one or more word clouds every day when you peruse the web as they are increasingly being added to web pages. If you look at the right side of this web page that contains this blog, you will see a word cloud (Tags) that shows the most frequent topics. You can see by their size that SAS Global Forum and SAS Administrators are some of the most frequently blogged-about topics.

In the age of social media, blogs, tweets, online reviews, ratings and recommendations, the ability to take unstructured data and analyze it for sentiment is key to a competitive advantage. Being able to analyze this data, understand customer’s opinions about various products and services, filter out the noise and find relevant content that can be acted upon are some of the advantages of using sentiment analysis.

To see how the new sentiment analysis works, let’s start by creating a creating a new word cloud. I’ll use data from our some of our SAS employee blogs at blogs.sas.com to illustrate.

Creating the word cloud Read More »

Post a Comment

How registering your SAS Users Group this summer can heat up your connections this fall

usersgroup-sas-logo-fullcolor (2)The school year may be over, but it’s not too late to renew your SAS Users Group registration for 2015.  Or, if you have always wanted to start a new Users Group in your area or company, you still have time to register a new group and receive support this year.

Getting your SAS Users Group Program started

Summer is the perfect time to reach out to the Users Group Programs team (that’s me!) for assistance with creating, planning or promoting your local, in-house or regional users group event.  Support is available to each of our recognized groups throughout the remainder of the year, whether you’re planning a live or virtual event.

What’s in it for you?

This year we’re expanding the resources available to leaders and attendees.  For the next six months, we will be releasing new tools, tips and tricks to help you along your SAS journey.

We will provide resources for the brand new users groups as well as the established users groups that have been around for years, but want to expand their reach and activity.  Look out for topics such as how to establish a marketing plan, nurturing membership, creating speaker programs, establishing online presence and much more.

We’re excited to work together with you to build an active community of SAS users in your community, company and around the world.  If you haven’t registered for 2015 yet, complete this form, and let’s get started!

Looking forward to helping you connect, educate and support as SAS users.

Post a Comment

What’s new for SAS Global Forum 2016

Lanyon System Header_PaperMgtBanner-FINAL (2)

Hello, I’m Jennifer Waller, your SAS Global Forum 2016 conference chair. This is the first of several blogs I’ll be writing to help keep you informed about the conference and ways you can get involved.

Because of my background in education, I’m especially excited about a new initiative at this year’s conference that showcases the next generation of analytics professionals.

The SAS® Student Symposium is a program designed to add to the great content that will be presented by users and SAS at the conference April 19-21, 2016 held at The Venetian in Las Vegas, NV.

About the SAS® Student Symposium

The SAS® Student Symposium, an initiative of the SAS Global Users Group Executive Board (SGUGEB) and SAS, is a competition where teams of 2-4 students and a faculty mentor work together to submit a paper to SAS Global Forum 2016 that answers a question of their choice on one of the provided publicly available “big data” sets using SAS analytics.

The goal of the symposium is to provide an outreach program for university students to enhance their SAS skills, increase their understanding of SAS products and solutions to solve real world problems, and develop a peer network of SAS users from other universities that use SAS for mutual benefit. The symposium will enhance the Student Ambassador and Student Scholarship programs that provide other avenues for students to attend SAS Global Forum and learn from other users and SAS experts.

Working together with SAS, the SGUGEB has been able to create a cloud-based solution where no less than 12 “big” publicly available data sets will be housed and each student team will have access to the data and a comprehensive set of SAS software products for the competition. Teams will define a problem, execute the appropriate analyses using SAS software and submit a paper that defines the problem, describes the analysis performed and presents the results in such a manner as to be of use in industry, government, or education. Read More »

Post a Comment

Can you Lag and Lead at the same time? If using the SAS DATA step, yes you can

ProblemSolversWithin the SAS DATA step, the LAG function is provided to return a variable’s value from a previous data set observation.  With certain data criteria, sometimes there is a need to look ahead at the next observation and you would expect to use a LEAD function, but this does not exist.  Why is that?

All SAS data sets are read sequentially one observation at a time into the PDV (Program Data Vector) for each DATA step iteration.  When processing the current observation, you cannot read in the next observation without losing or replacing the current data in the PDV.  It is kind of like listening to music.  You cannot listen to one song track and jump ahead to the next song track to listen to both songs at the same time.  Therefore, with sequential processing it is not possible to design what would be called a LEAD function, but the good news is you can simulate this functionality by using the DATA step, the MERGE statement, and a few data set options.

The main trick to simulating a LEAD function is by merging a data set with itself and using the FIRSTOBS=2 data set option on the second read of the data set.  This will offset the observations read by one observation, allowing you to have both the current observation value and the next observation value in the PDV at the same time.  The next trick to having the simulation of a LEAD function work is to use the KEEP= and RENAME= data set options on the second read of the data set.  When reading the same data set for the second time, you only want to keep the variables needed for processing, and you must rename those variables to have unique names.  This allows all values needed for the current observation and from the next observation to be in the PDV during the same DATA step iteration.

Read More »

Post a Comment

Recent improvements to SAS automated migration tools

Recent updates to SAS 9.4 have introduced some nice improvements in support of automated migration using the SAS Migration Utility and SAS Deployment Wizard. SAS has been working hard to make the migration experience more user friendly and less error prone. Changes have focused on making errors easier to identify and on preventing further difficulties that can be caused by ignoring issues identified by the SAS Migration Utility. If you’re preparing to migrate to SAS 9.4, you’ll want to take a look at some of these useful changes:

Addition of an error count to migration tool

When you run the SAS Migration Utility on your source system to analyze or package the deployment, there are now additional details written to the console about the results of your execution. Included in the new output is a count of any errors that the SAS Migration Utility encountered. This enhancement flags errors immediately instead of relying on the user viewing the analysis report to determine of there are errors.

The error count is listed separately for those encountered during analysis of the source system and those encountered during packaging.  In the example below, you can see that two errors were encountered during the analysis of the source environment. The console message refers the user to the analysis report for more details.

SAS Migration Utility output showing two errors encountered during analysis phase

 

Viewing the analysis report will show the details of the two errors that the analysis surfaced. In this case some issues with SAS BI Report Services Workspace Configuration. While reviewing an analysis report, if you are unclear of the implications of an error or warning  you can check the notes on the SAS Migration Utility. A link to the website is provided at the top of the analysis report.

Detail table of SMU analysis errors

Reporting packaging errors separately

The output below is from a packaging run of the SAS Migration Utility on a SAS 9.2 machine which contains the SAS Middle Tier.  In this run, the SAS Migration Utility reported an error during the packaging execution phase.

SAS Migration Utility output showing an error during packaging phase

Reviewing the analysis report reveals another new feature. When errors are encountered during the packaging of the source environment, the utility now separates these errors in an execution phase section on the report.

SAS Migration Utility report showing detail of packaging error

The error message refers the user to a new log that is created.  A log file (errors.log) is created when errors occur in the packaging/execution phase. The log is written to the machine directory of the migration package for the tier that is being packaged. The error log is a subset of the full migration log (migrate.log) containing only error messages.

tools5

Preventing the use of invalid migration packages

This is all great new functionality that will help SAS administrators identify and address errors and make it easier to create valid migration packages. It has always been recommended that you should have a clean migration package, meaning an analysis report with no errors, prior to running the SAS Deployment Wizard to complete your migration. With SAS 9.4, if  you have any errors in your analysis report the SAS Deployment Wizard will now prevent you from using the package. In the SAS Deployment Wizard, if you select a migration package that has errors in the analysis report, the Wizard will now detect that your package has errors and prevent you from continuing with the deployment.

Error message showing cancellation of migration due to packaging error

These improvements to the SAS Migration Utility and the SAS Deployment Wizard have made it easier to identify issues during the planning and preparation phase of a migration, and less likely that issues will be ignored, resulting in failures in the deployment phase.

Post a Comment

SAS administrators: What’s next after your Hadoop introduction?

Hadoop has been called a game changer technology. Here’s why:

- DATA IS DIFFERENT: We now have to deal with both structured and unstructured data.

- NO LIMITS: We now deal with Terabyte or Petabyte data size and not just with old Megabyte.

- COMPLEXITY: We work with complex multi-server architectures and with a complete "zoo"!

- COSTS: We spend less money on hardware and storage.

As SAS technology is becoming more integrated with Hadoop, I noticed a lot of situations where SAS administrators need to work closely with Hadoop administrators, and in some situation substitute them.

It's becoming vital for SAS administrators to be familiar with Hadoop ecosystem, as I suggested in my previous article.

Ideally the administrator should be able to perform Hadoop admin tasks, run basic command on the clusters, and check some of the main indicators related to the health of the environment. Read More »

Post a Comment

Evaluating tradeoffs when designing storage for SAS applications

Recently my wife and I took our annual anniversary trip – this time we went to the Grand Canyon, staying in Las Vegas. In researching our options to fly from Raleigh-Durham (RDU) to Las Vegas (LAS), we had several different selection criteria:

  • what time we wanted to leave
  • what time we wanted to arrive
  • price
  • number of stops
  • layover time
  • airline – loyalty program
  • type of aircraft – seating, amenities, food, wifi

All of the flights we looked at would get us from RDU to LAS and back. So the destination wasn’t the issue – it was how much value we placed on each of the attributes: arriving in the afternoon (hotel check-in is 3:00pm) versus spending more money for a nonstop flight, for example. We made our decisions based on our specific needs at that time. We also have different opinions of what was important (I’m basically cheap, and my wife refuses to take the red-eye flight).

The evaluation of storage for a SAS solution can be viewed in a similar fashion. There are tradeoffs to be made, or certainly criteria which will be evaluated and prioritized. This blog posting will briefly examine three such attributes and how they may impact storage in a SAS environment.

Who says you can’t have it all?

tradeoffsThis diagram highlights three of the more common attributes that are considered when evaluating storage. While there are certainly other considerations (capacity, interfaces, architecture), these three are usually involved in most storage decisions. This diagram also suggests that there are tradeoffs to be considered: for example, between price and performance (higher performance may require higher price). Let’s briefly examine each of these, and where we may see tradeoffs in a SAS environment.

Price

Price is usually among the first attributes that come up in any discussion of storage. Everyone is looking to save money, and unfortunately storage often gets compromised. Consider this scenario: our SAS deployment will need about 5 terabytes (TB) of storage. In terms of raw capacity, a new 5TB disk drive can be bought from a number of online vendors for around $150.00 USD. While this drive may meet the capacity requirements, it most likely is not the best selection for a SAS deployment – especially if there are performance or availability considerations. Typical enterprise-class SAS storage may involve configurations with multiple disks and controllers, and perhaps shared storage such as  Network Attached Storage (NAS) or Storage Area Networks (SAN). Factoring in these, and possibly other, considerations would most likely (significantly!) increase the price of our storage.

Performance

SAS applications are consumers of storage, and have significant performance expectations for I/O throughput. Many SAS field consultants can share stories of under-performing storage leading to failed deployments and unhappy customers. SAS has minimum recommended I/O throughput rates of file systems that are to be used in a SAS environment, and the Performance Evaluation team within SAS R&D has written several papers that document best practices and tuning guidelines. There is even a usage note about testing throughput for your SAS9 File Systems. Multiple configuration options are reviewed and discussed, ranging from shared file systems to external SAN or NAS arrays.

Availability

Deploying SAS applications into a business-critical environment or where there are availability requirements such as a Service Level Agreement require careful attention to the type and configuration of storage used. Since SAS is implemented on the host OS file systems, commonly used high availability strategies can be used effectively. From simple strategies, such as configuring local storage using RAID mirroring, to more complex enterprise-class solutions, such as redundancy through a SAN, the appropriate level of high availability can be designed and deployed to assure that the storage is designed to meet the needs of the business.

So how does all this fit together?

tradeoffsAs you can see, none of these criteria should be considered independent of the others when designing and evaluating storage solutions for SAS environments. There will be tradeoffs made in the evaluation process, and priorities will be established. For some areas, such as performance, there are guidelines established by SAS R&D. In other areas, specific needs of the customer (a specific SLA, for example) may dictate specific design decisions. In addition, there’s some flexibility in certain areas – filesystems containing SAS permanent data should be allocated to a more available, more protected storage area than the temporary filesystem of SASWORK. A detailed analysis of the storage needs of the SAS deployment as a part of the overall architecture design will consider these three, in addition to other criteria.

 

In case you were wondering, we didn’t take the red-eye.

Post a Comment

SAS ODS destination - Google maps

You are all familiar with the traditional SAS Output Delivery System (ODS) destinations such as LISTING, HTML, PDF, or POWERPOINT that use “destination” in a sense of type of the output file. However, in this blog post, I am going to use term “destination” in even more traditional sense – as the place to which someone is going or to which something is being delivered.

Marrying ODS destination with Google map destination

To be precise, I am going to use term “destination” for information delivery to a specific location on Google map. We will produce SAS ODS output and deliver it to particular locations on a Google map.

Since Google maps exist in the web page environments that are essentially HTML, the best suited ODS destination for such a “marriage” will be ODS HTML.

With this technique, we are not limited by mere texts or images placed on Google maps. We can place on Google map output of any SAS procedure – tables and graphs.

Here is an interactive example of delivering ODS output to a Google map destination/location (the data itself is totally fictitious and serves the purpose of illustrating the technique only). Please take a minute to explore this Google map interaction before reading further.

Sample of Google map with embedded SAS ODS output

How it is done

Click to view and download the full sample SAS code. In this code, I capitalized on my prior posts on Google map with SAS. In particular, I used the idea of creating mini-pages for each location described in Weather forecasting with SAS-generated Google maps.

The main difference here is that the HTML mini-pages (place1.html, place2.html, etc.)  for each geographical location were created using SAS ODS HTML destination, which illustrated by the following SAS macro:

%macro create_sas_outputs;

  %let dsid = %sysfunc(open(places));
  %let num  = %sysfunc(attrn(&dsid,nlobs));
  %let rc   = %sysfunc(close(&dsid));

  filename odsout "&proj_path\infopages";

  %do j=1 %to #

    data _null_;
      p = &j;
      set places point=p;
      call symput('placename',place);
      stop;
    run;

    ods html path=odsout file="place&j..html" style=styles.seaside;

    goptions reset=all device=actximg colors=() htext=9pt hsize=3in vsize=1.5in;

    title1  bold h=10pt color=cx3872ac "SAS user levels in &placename";
    axis1 label=none;
    axis2 label=none value=none minor=none major=none;

    proc gchart data=sasusers(where = (place eq "&placename"));
         vbar saslevel /
        sumvar = count
        width = 10
        outside = sum
        raxis = axis2
        maxis = axis1
        cframe = white nozero discrete 
        ;
      format count comma15. saslevel levelf.;
    run;
    quit;

    ods html close;

  %end;

%mend create_sas_outputs;

In this macro, for simplicity of the code sample I used PROC GCHART, but it can be any SAS procedure (PROC REPORT, PROC TABULATE - you name it) or a combination of several SAS procedures.

The rest of the technique is based on creating Google map InfoWindows that reference these HTML mini-pages via <iframe> tag as shown in this code snippet:

put 'var info' i '= ''<iframe style="width:320px;height:215px" src="infopages/place' i +(-1) '.html">'';' /

I hope this post will serve as yet another illustration of the power of SAS as a tool for information delivery where it’s needed and when it’s needed.

What are your thoughts on this?

Post a Comment

Hadoop skills for SAS administrators – why you need them and where to start

SAS_Hadoop_ElephantIf you have your SAS Certified Platform Administrator Credential, then it’s clear that you’ve studied a lot to achieve it. But suddenly the Hadoop era shows up and what you find are big gaps in your skills inventory.

SAS administrators must be familiar with all the data SAS platform can interact with, especially now that there is Hadoop. Hadoop is not just a database--it's a different platform and a new world when compared with the SAS platform—so you can’t reuse your “old” skills when working with SAS and Hadoop (at least not all of them!).

Here are the questions we must ask: Read More »

Post a Comment