Post a Comment
In a previous blog post, I demonstrated combining the power of SAS Event Stream Processing (ESP) and the SAS Quality Knowledge Base (QKB), a key component of our SAS Data Quality offerings. In this post, I will expand on the topic and show how you can work with data from multiple QKB locales in your event stream.
To illustrate how to do this I will review an example where I have event stream data that contains North American postal codes. I need to standardize the values appropriately depending on where they are from – United States, Canada, or Mexico – using the Postal Code Standardization definition from the appropriate QKB locale. Note: This example assumes that the QKB for Contact Information has been installed and the license file that the DFESP_QKB_LIC environment variable points to contains a valid license for these locales.
In an ESP Compute window, I first need to initialize the call to the BlueFusion Expression Engine Language function and load the three QKB locales needed – ENUSA (English – United States), ENCAN (English – Canada), and ESMEX (Spanish – Mexico).
Post a Comment
Recently, one of sons came to me and asked about something called “The Monty Hall Paradox.” They had discussed it in school and he was having a hard time understanding it (as you often do with paradoxes).
For those of you who may not be familiar with the Monty Hall Paradox, it is named for the host of a popular TV game show called “Let’s Make a Deal.” On the show, a contestant would be selected and shown a valuable prize. Monty Hall would then explain that the prize is located just behind one of three doors and asked the contestant to pick a door. Once a door was selected, Monty would then tease the contestant with cash to get him/her to either abandon the game or switch to another door. Invariably, the contestant would stand firm and then Monty would proceed to show the contestant what was behind one of the other doors. Of course, it wouldn’t be any fun if the prize was behind the revealed door, so after showing the contestant an empty door Monty would then ply them with even more cash, in the hopes that they would abandon the game or switch to the remaining door.
Almost without fail, the contestant would stand firm in their belief that their chosen door was the winner and would not switch to the other door.
So where’s the paradox?
When left with two doors, most people assume that they've got a 50/50 chance at winning. However, the truth is that the contestant will double his/her chance of winning by switching to the other door.
After explaining this to my son, it occurred to me that this would be an excellent exercise for coding in Python and in SAS to see how the two languages compared. Like many of you reading this blog, I’ve been programming in SAS for years so the struggle for me was coding this in Python.
I kept it simple. I generated my data randomly and then applied simple logic to each row and compared the results. The only difference between the two is in how the languages approach it. Once we look at the two approaches then we can look at the answer.
Post a Comment
One very useful type of auditing for a SAS administrator is to have summary data about the availability and performance of various resources (platforms, servers, services) from the 30,000-foot view. Using SAS Environment Manager, it's easy to go in and look at the availability of any one resource over various time spans--for the past few hours, past day, past week, or past month and more. This is a very powerful way to summarize how much of the time a given resource was "up and responsive."
However, there's no way to see that type of information, or even a summary of that information, for all your servers at once. A typical deployment will have anywhere from 10 to 50 or more different servers, and to view the availability of them all, over an extended period, you would have to visit and drill down each resource, one at a time.
To help answer this problem, I've developed a simple report that summarizes the availability for all servers. It uses two data sets that are automatically generated as part of the SAS Environment Manager Data Mart - availability and resourceinventory. It doesn't provide the hour-by-hour information like the Monitoring interface of SAS Environment Manager does, but it gives you a percent of time available, for each server, for either the past day, week, month, or even quarter, in one summary report (if you have the data for it). For a production environment, this could provide helpful "big-picture" information on availability.
Post a Comment
SAS Event Stream Processing (ESP) cannot only process structured streaming events (a collection of fields) in real time, but has also very advanced features regarding the collection and the analysis of unstructured events. Twitter is one of the most well-known social network application and probably the first that comes to mind when thinking about streaming data source. On the other hand, SAS has powerful solutions to analyze unstructured data with SAS Text Analytics. This post is about merging 2 needs: collecting unstructured data coming from Twitter and doing some text analytics processing on tweets (contextual extraction, content categorization and sentiment analysis).
Before moving forward, SAS ESP is based on a publish and subscribe model. Events are injected into an ESP model using an “adapter” or a “connector.” or using Python and the publisher API. Target applications consume enriched events output by ESP using the same technology, “adapters” and “connectors.” SAS ESP provides lots of them, in order to integrate with static and dynamic applications.
Then, an ESP model flow is composed of “windows” which are basically the type of transformation we want to perform on streaming events. It can be basic data management (join, compute, filter, aggregate, etc.) as well as advanced processing (data quality, pattern detection, streaming analytics, etc.).
SAS ESP Twitter Adapters background
SAS ESP provides two adapters to connect to Twitter as a data source and to publish events from Twitter (one event per tweet) to a running ESP model. There are no equivalent connectors for Twitter.
Both two adapters are publisher only and include:
Post a Comment
In a number of my previous blogs I have discussed auditing within a SAS environment and how to identity who has accessed data or changed reports. For many companies keeping an audit trail is very important. If you’re an administrator in your environment and auditing is important at your organization, here are a few steps to take to secure the auditing setup and possibly audit any changes made to it, all the while ensuring there are no gaps in collecting this information.
In a SAS deployment the logging configuration is stored in XML files for each server configuration. The xml files can be secured with OS permissions to prevent unauthorized changes. However, there are a number of ways in which a user can temporarily adjust logging settings which may allow them to prevent audit messages from reaching a log.
Logging can be adjusted dynamically in:
SAS Code using logging functions and macros
SAS Management Console
Using PROC IOMOPERATE
As an example of how a user could circumvent the audit trail let’s look at an environment where logging has been configured to audit access to SAS datasets using the settings described in this blog. When auditing is enabled, messages are recorded in the log when users access a table. In the test case for the blog we have a stored process that prints a SAS dataset. When the stored process runs, the log will record the table that was opened and the user that opened it.
Post a Comment
With my first open source software (OSS) experience over a decade ago, I was ecstatic. It was amazing to learn how easy it was to download the latest version on my personal computer, with no initial license fee. I was quickly able to analyse datasets using various statistical methods.
Organisations might feel similar excitement when they first employ people with predominantly open source programming skills. However, it becomes tricky to organize an enterprise-wide approach based solely on open source software. Decision makers within many organisations are now coming to realize the value of investing in both OSS and vendor provided, proprietary software. Very often, open source has been utilized widely to prototype models, whilst proprietary software, such as SAS, provides a stable platform to deploy models in real time or for batch processing, monitor changes and update - directly in any database or on a Hadoop platform.
Industries such as pharma and finance have realised the advantages of complementing open source software usage with enterprise solutions like SAS.
A classic example is when pharmaceutical companies conduct clinical trials, which must follow international good clinical practice (GCP) guidelines. Some pharma organisations use SAS for operational analytics, taking advantage of standardized macros and automated statistical reporting, whilst R is used for the planning phase (i.e. simulations), for the peer-validation of the results (i.e. double programming) and for certain specific analyses.
Post a Comment
Over the past 37 years I've had the good fortune to be able to attend and present at hundreds of in-house, local, regional, special-interest and international SAS events. I am a conference junkie. I've not only attended thousands of presentations, Hands-On Workshops, tutorials, breakout sessions, quick tips, posters, breakfasts, luncheons, mixers and more, but have had the privilege of hearing, seeing and networking with thousands of like-minded SAS users and presenters as they share valuable tips, techniques, advice, and suggestions on how to best use the SAS software.
For me, attending, volunteering and participating at SAS conferences and events has not only brought personal satisfaction like nothing else, it has allowed me to grow myself professionally and make many life-long friends. One of my objectives while attending a conference is to identify and learn at least three new things I didn't already know about SAS software. These three new things could consist of "cool" programming tips, unique coding techniques, "best" practice conventions, or countless other SAS-related nuggets.
At the upcoming 2016 MidWest SAS Users Group (MWSUG) Educational Forum and Conference, I'll be presenting several topics near and dear to my heart including "Top Ten SAS Performance Tuning Techniques." This 50-minutes presentation highlights my personal top ten list of performance tuning techniques for SAS users to apply in their programs and applications. If you are unable to attend, here are a couple programming tips and techniques from each performance area to consider.
1. Use IF-THEN / ELSE or SELECT-WHEN / OTHERWISE in the DATA step, or a Case expression in PROC SQL to conditionally process data.
2. CPU time and elapsed time can be reduced by using the SASFILE statement to process the same data set multiple times.
Post a Comment
Every day, more than one hundred thousand SAS users visit our website looking for SAS information and resources. Given its importance to our user base, we’re constantly looking for ways to evolve the site. Over the next few months, you’ll notice changes to the support website, changes we believe will provide you with a better user experience.
Today, we launch a beta version of six top-level support pages – accessible via your computer, smartphone or tablet by clicking on the banner on support.sas.com. The beta pages may look different from what you’re used to, but they are fully functional, and you can use them to fulfill your support needs. During the beta period, all current support.sas.com pages will still be available for your use, but we hope you’ll give the new beta pages a try.
As a company with more than 40 years of experience delivering business analytics software, we focus on making sure that you – our customers – are the beginning point for all of our work – whether we’re developing a new product or designing a new website.
Jim Goodnight, our CEO, put it best when he said, “For the past four decades, we’ve used a simple approach with our customers. We ask them what they want, and then we develop it for them.”
And that’s what’s at the heart of our support site evolution.
We’ve listened to your comments and suggestions. We’ve heard what’s most important to you. And we’ve designed the new support pages with you in mind, focusing on your key tasks so you can access information, find answers and get help quickly and easily. Important information is now front and center and new, simplified navigation will enable you to get what you need with fewer clicks. We believe the new pages deliver the experience you’ve asked us for, but we’ll let you be the judge.
Post a Comment
In a previous blog post I explained how end users should code and use shared locations for SAS artifacts, to avoid issues in a SAS Grid Manager environment. Still, they could still fall in some sharing issues, which could have very obscure manifestations. For example, users opening SAS studio might notice that it automatically opens to the last program that they were working on in a previous session… sometimes. Other times, they may logon and find that SAS Studio opens to a blank screen. What causes SAS Studio to “sometimes” remember a previous program and other times not? And why should this matter, when all I am looking for, are my preferences?
Where are my preferences?
SAS Studio has a Preferences window that enables end users to customize several options that change the behavior of different features of the software. By default, these preferences are stored under the end-user home directory on the server where the workspace server session is running (%AppData%/SAS/SASStudio/preferences in Windows or ~/.sasstudio/preferences in UNIX). Does this sentence ring any alarm bells? With SAS Studio Enterprise Edition running in a grid environment, there is no such thing as “the server where the workspace server session is running!” One invocation of SAS Studio could run on one grid node and the next invocation of SAS Studio could run on a different grid node. For this reason, it might happen that a preference that we just set to a custom value reverts to its default value on the next sign-in. This issue can become worse because SAS Studio follows the same approach to store code snippets, tasks, autosave files, the WEBWORK library, and more.
Until SAS Studio 3.4, the only solution to this uncertainty was to have end users’ home directories shared across all the grid nodes. SAS Studio 3.5 removes this requirement by providing administrators with a new configuration option: webdms.studioDataParentDirectory. This option specifies the location of SAS Studio preferences, snippets, my tasks, and more. The default value is blank, which means that the behavior is the same as in previous releases. An administrator can point it to any shared location to access all of this common data from any workspace server session.
Post a Comment
I started out as a Psychology major. During my third year as an undergraduate, I was hired on as a research assistant for my advisor in her cognitive psychology lab. Through this and progressively more complicated psychological research experience, I quickly grew to love statistics. By the end of that year, I decided to declare it as a second major. My first introduction to SAS was as a fourth-year undergraduate psychology student - still new to my statistics degree curriculum and working on a large-scale meta-analysis project spanning years of data. I had never programmed before seeing my first SAS procedure. I broke down in tears, terrified at what I had gotten myself into. I toughed it out (with help from my statistics professor), finished my psychology honors thesis with top grades and went on later to use SAS in my statistics thesis for good measure.
About a year later, in 2011, that same statistics professor encouraged us to submit our work for presentation at MWSUG, sweetening the deal with a promise of extra credit if we did. I hopped on that opportunity and submitted both my psychology thesis as well as my statistics thesis that night. A couple of months later, I received an email…they accepted both of my papers and awarded me a FULL student scholarship to attend!
I have come a long way from presenting my first thesis projects (I just arrived home from my 27th conference last weekend). I have learned to love not only SAS, but the statistics behind each procedure. This year, at MWSUG 2016 in Cincinnati, OH. I will be presenting 3 projects. One project will be in ePoster format. As the chair of this section (yes, this is correct. I’ve gone from terrified student to a section chair!), I felt the need to support it with my own research as well. This project is dedicated to the common and very pesky concept of Multicollinearity.
What is Multicollinearity? Why, it is precisely the statistical phenomenon wherin there exists a perfect or exact relationship between the identified predictor variables and the interested outcome variable. Simply put, it is the existence of predictor co-dependence. Coincidently, it is quite easy to detect. You can do so with three very simple to utilize options and one procedure, such as those given in the below example:
/* Examination of the Correlation Matrix */
Proc corr data=temp;
Var hypertension aspirin hicholesterol anginachd smokingstatus obese_BMI exercise _AGE_G sex alcoholbinge; Run;
/* Multicollinearity Investigation: VIF TOL COLLIN */
Proc reg data=temp;
Model stroke = hypertension aspirin hicholesterol anginachd smokingstatus obese_BMI exercise _AGE_G sex alcoholbinge / vif tol collin;