Five great analytics resolutions for 2017

analytics resolutionsThe holiday season is over – and you survived. You’ve made a lot of personal resolutions for 2017 - go to the gym, eat less sugar, save more money, visit Grandma more often. These are all great personal resolutions for 2017, but what about your analytics resolutions? If you are having trouble with your analytics resolutions then let us help you out. The recent release of SAS 9.4 M4 will help you make 2017 your best analytics year yet.

Resolution 1: Build more accurate models faster!

Now you will be able to leverage the power of the two most advanced analytics platforms on the market, SAS 9 and SAS Viya from one interface. Using SAS/Connect, users can call powerful SAS Viya analytics from within a process flow in Enterprise Miner. Would you prefer to use the super-fast, autotuned gradient boosting in SAS Viya? No problem! Call SAS Viya analytics directly from Enterprise Miner using the SAS Viya Code node. Then, from the same process flow you can also call open source models, all from one interface, SAS Enterprise Miner. Do you prefer to use SAS Studio on SAS 9? You will also be able to call SAS Viya analytics from SAS Studio as well. With SAS 9 M4, SAS gives you the ability to use both of SAS’ powerful platforms from one interface.

Resolution 2: Score your unstructured models in Hadoop without moving your data!

Got Hadoop? Got a lot of unstructured data? Now SAS Contextual Analysis allows you to score models in Hadoop using the SAS Code Accelerator add-on. Identify new insights with your unstructured text without ever having to move your data. Score it all in Hadoop. Uncover new trends and topics buried in documents, emails, social media and other unstructured text that is stored in Hadoop. You will be able to do it faster because you won’t have to move that data outside of Hadoop. SAS just keeps getting better in 2017.

Read More »

Post a Comment

Junior Professional Program helps new users attend SAS Global Forum 2017

sasgf2017_globe_150x150-002Regardless of how long they’ve used the software, there’s no better event for SAS professionals than SAS Global Forum. The conference will attract thousands of users from across the globe and is an excellent place to network with and learn from users of all skill levels. To help those relatively new users of SAS experience the conference for the first time, the conference offers the Junior Professional Award program.

The program is designed exclusively for full-time SAS professionals who have used SAS on the job for three years or less, have never attended SAS Global Forum, and whose circumstances would otherwise keep them from attending. But, don’t let the word “junior” confuse you. All “new” SAS professionals regardless of age are eligible.

The Junior Professional award provides users with a waived conference registration fee, including conference meals, a free pre-conference tutorial, and great opportunities to learn from and network in a large community of SAS users. The program does not cover other costs associated with attending the event (travel and lodging, for example, are not included).

To apply, users need to submit fill out the online application form. Award applications must be received by January 16, 2017. Questions can be directed to the Junior Professional Program Coordinator, whose contact information can be found on the website.

To learn more about the award and its benefits, I recently sat down with one of the 2015 winners, Shavonne Standifer.


junior-professional-program

Shavonne Standifer, 2015 SAS Global Forum Junior Professional Award winner

Larry LaRusso: Hello Shavonne. First of all, let me congratulate you on winning a past award. That’s a great accomplishment, for sure. So tell me, how did you first learn about the program?
Shavonne Standifer: Interestingly, I wasn’t looking specifically for the award and didn’t even really know it existed. I was searching for a SAS proceeding paper and somehow stumbled across the application. I just applied, and got it!

LL:  That’s awesome. What made you want to attend SAS Global Forum?
SS: I knew a little bit about the event and really wanted to attend so that I could take advantage of the hands-on learning opportunities. I also thought it would be super cool if I could attend the lectures of my favorite SAS authors, and I knew many of them planned to present.

LL: What were your first impressions of the event?
SS: I was amazed by how many people were there. I was also amazed by how nice and helpful everyone was. I met so many new friends.

LL: What was the best part of your Global Forum experience?
SS: The best part of my experience by far was when I met John Amrhein. We met during a networking event in the Quad. After subjecting him to a 2-minute rant about how much I loved SAS software, and all of the reasons why, he finally had a minute to introduce himself and mentioned that he was the 2017 global forum conference chair. I was completely shocked! To my complete surprise, he encouraged me to be a part of his team, to which I later applied and was accepted.

Read More »

Post a Comment

Word scatter plot with SAS

In my last blog, I showed you how to generate a word cloud of pdf collections. Word clouds show you which terms are mentioned by your documents and the frequency with which they occur in the documents. However, word clouds cannot lay out words from a semantic or linguistic perspective. In this blog I’d like to show you how we can overcome this constraint with new methods.

Word embedding has been widely used in Natural Language Processing, where words or phrases from the vocabulary are mapped to vectors of real numbers. There are several open source tools that can be used to build word embedding models. Two of the most popular tools are word2vec and Glove, and in my experiment I used Glove. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Suppose you have obtained the term frequencies from documents with SAS Text Miner and downloaded the word embedding model from http://nlp.stanford.edu/projects/glove/. Next you can extract vectors of terms using PROC SQL.

libname outlib 'D:\temp';
* Rank terms according to frequencies;
proc sort data=outlib.abstract_stem_freq;
   by descending freq;
run;quit;
 
data ranking;
   set outlib.abstract_stem_freq;
   ranking=_n_;
run;
 
data glove;
   infile "d:\temp\glove_100d_tab.txt" dlm="09"x firstobs=2;
   input term :$100. vector1-vector100;
run;
 
proc sql;
   create table outlib.abstract_stem_vector as
   select glove.*, ranking, freq  
   from glove, ranking
   where glove.term = ranking.word;
quit;

Read More »

Post a Comment

More SAS Studio Tips for SAS Grid Manager Administrators: Global Settings

We have seen in a previous post of this series how to configure SAS Studio to better manage user preferences in SAS Grid environments. There are additional settings that an administrator can leverage to properly configure a multi-user environment; as you may imagine, these options deserve special considerations when SAS Studio is deployed in SAS Grid environments.

SAS Studio R&D and product management often collect customer feedback and suggestions, especially during events such as SAS Global Forum. We received several requests for SAS Studio to provide administrators with the ability to globally set various options. The goal is to eliminate the need to have all users define them in their user preferences or elsewhere in the application. To support these requests, SAS Studio 3.5 introduced a new configuration option, webdms.globalSettings. This setting specifies the location of a directory containing XML files used to define these global options.

Tip #1

How can I manage this option?

The procedure is the same as we have already seen for the webdms.studioDataParentDirectory property. They are both specified in the config.properties file in the configuration directory for SAS Studio. Refer to the previous blog for additional details, including considerations for environments with clustered mid-tiers.

Tip #2

How do I configure this option?
By default, this option points to the directory path !SASROOT/GlobalStudioSettings. SASROOT translates to the directory where SAS Foundation binaries are installed, such as /opt/sas/sashome/SASFoundation/9.4 on Unix or C:/Program Files/SASHome/SASFoundation/9.4/ on Windows. It is possible to change the webdms.globalSettings property to point to any chosen directory.

SAS Studio 3.6 documentation provides an additional key detail : in a multi-machine environment, the GlobalStudioSettings directory must be on the machine that hosts the workspace servers used by SAS Studio. We know that, in grid environments, this means that this location should be on shared storage accessible by every node.

Read More »

Post a Comment

Transform your technical talks with an audience centered approach

melissa_marshallEditor's note: This following post is from Melissa Marshall, Principal at Melissa Marshall Consulting LLC. Melissa is a featured speaker at SAS Global Forum 2017, and on a mission to transform how scientists and technical professionals present their work.  

Learn more about Melissa.


Think back to the last technical talk you were an audience member for. What did you think about that talk? Was it engaging and interesting? Boring and overwhelming?  Perhaps it was a topic that was important to you, but it was presented in a way that made it difficult to engage with the content. As an expert in scientific presentations, I often observe a significant “disconnect” between the way a speaker crafts a presentation and the needs of the audience. It is my belief that the way to bridge this gap is for you, as a technical presenter, to become an audience centered speaker vs. a speaker centered speaker.

transform-your-technical-talks01

Here I will provide some quick tips on how to transform your content and slides using your new audience centered speaking approach!

Audience Centered vs. Speaker Centered

The default setting for most presenters is that they are speaker centered—meaning that they make choices in their presentation because it is what works primarily for themselves as a speaker. Examples include: spending a lot of time speaking about an area of the topic that gave you the most difficulty or that you spent the most amount of time working on or using terms that are familiar to you but are jargon for the audience, putting most of the words you want to say on your slides to remind you what to say during the talk so your slides are basically your speaker notes, and standing behind a podium and disconnecting yourself physically from your audience. These choices are common in presentations, but they do not set you up for success. It is a key reason why many presentations of technical information fail.

A critical insight is to realize that your success as a speaker depends entirely upon your ability to make your audience successful.  You don’t get to decide that you gave a great talk (even if no one understood it)!  That’s because presentations, by their very nature, are always made for an audience.  You need something from your audience—that is why you are giving a talk!  So, it is time to get serious about making your audience successful (so you can be too!).  I might define “audience success” as: your audience understands and views your subject in the way you wanted them to.  Strategically, if you desire to be a successful speaker, then the best thing you do is go “all in” on making your audience successful!

Audience Centered Content

To make your content more audience centered, you can ask yourself 4 critical questions ahead of time about your audience:

  • Who are they?
  • What do they know?
  • Why are they here?
  • What biases do they have?

transform-your-technical-talks02

Read More »

Post a Comment

Easier Space Management for EV Data Mart Tables in 9.4M4

The report-ready SAS Environment Manager Data Mart has been an invaluable addition to SAS 9.4 for SAS administrators. The data mart tables are created and maintained by the SAS Environment Manager Service Architecture Framework and provide a source of data for out-of-the box reports as well as custom reports that any SAS administrator can easily create. As you can imagine, the size of the tables in the data mart can grow quite large over time so balancing the desired time span of reporting and the size of the tables on disk requires some thought. The good news: SAS 9.4 M4 has made that job even easier.

The Environment Manager Data Mart (EVDM) has always provided a configuration setting to determine how many days of resource records to keep in the data mart tables. You can see below that in a fresh SAS 9.4 M4 installation, the default setting for “Number of Days of Resource Records in Data Mart” is set to 60 days. This means that EVDM data records older than 60 days are deleted from tables whenever the data mart ETL process executes.

EV Data Mart Tables in 9.4M4

The space required to house the Environment Manager Data Mart is split across three primary areas.

  • The ACM library tables contain system level information
  • The APM library tables contain audit and performance data culled from SAS logs
  • The KITS library tables contains miscellaneous tables created by data mart kits that collect specialty information about HTTP access, SAS data set access, and such.

Read More »

Post a Comment

Dr. Strangeformat or: How I Learned to Stop Joining Tables and Love the PROC

Joining tables with PROC FORMAT

The title of this post borrows from Stanley Kubrick’s 1964 comedy “Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb.” It stars the great Peter Sellers as the titular character as well as George C. Scott and Slim Pickens. The black and white film is strange and brilliant just like Kubrick was. Similarly, as I was experiencing the issue I outline below and was told of this solution, I thought two things. The first was “what a strange but brilliant solution” and the second one was “I’ll try anything as long as it works.”   Thus, a title was born. But enough about that. Why are we here?

Problem

You want to add a couple of columns of information to your already large dataset, but each time you try to join the tables you run out of memory!  For example, you want to append latitude and longitude values from Table B to an existing list of customer phone numbers in Table A.

You’ve tried this and got nowhere fast:

proc sort data = demo.tablea;
by npa nxx;
proc sort data = demo.tableb;
by npa nxx;
run;
 
data demo.aunionb;
merge demo.tablea (in=a) demo.tableb (in=b);
by npa nxx;
if a;
run;

And then you tried this and also got nowhere (albeit a little slower):

proc sql;
   	create table demo.aunionb as 
   	select (*),
	from demo.tablea a
left join demo.tableb b on (a.npa = b.npa) and (a.nxx = b.nxx);
quit;

Solution - Joining tables with PROC FORMAT

Use PROC FORMAT!

Here’s how:

Read More »

Post a Comment

Creating and Using Multilabel Formats

Problem SolversA multilabel format enables you to assign multiple labels to a value or a range of values. The capability to add multilabel formats was added to the FORMAT procedure in SAS® 8.2.  You assign multiple labels by specifying the MULTILABEL option in the VALUE statement of PROC FORMAT. For example, specifying the MULTILABEL option in the following VALUE statement enables the Agef format to have overlapping ranges.

value agef (multilabel)
11='11'
12='12'
13='13'
11-13='11-13';

Multilabel formats are available for use only in the MEANS, SUMMARY, TABULATE, and REPORT procedures. The code examples that follow show the creation of a simple mutlilabel format (using PROC FORMAT) and its use in each of these procedures.

First, a PROC FORMAT step creates a multilabel format for the Age variable in the Sashelp.Class data set, along with a character format for the Sex variable. The NOTSORTED option is specified to indicate the preferred order of the ranges in the results.

proc format library=work;
value agef (multilabel notsorted)
11='11'
12='12'
13='13'
11-13='11-13'
14='14'
15='15'
16='16'
14-16='14-16';
value $sexf
'F'='Female'
'M'='Male';
run;

Read More »

Post a Comment

Truncating decimal numbers in SAS without rounding

paper-money-stackImagine making $50K a day out of thin air. Did you know that NASDAQ routinely processes around 10,000,000 trades a day? What if instead of rounding cents for each transaction, market makers truncated fractions of cents in the amount they owe you? Under the assumption that each transaction, on average, has half a cent that is usually rounded away, this would produce 10,000,000 x $0.005 = $50,000 and nobody would even notice it. I am not saying it's legal, but this example is just an illustration of the power of ordinary truncation.

However, sometimes it is necessary to truncate displayed numeric values to a specified number of decimal places without rounding. For example, if we need to truncate 3.1415926 to 4 decimal places without rounding, the displayed number would be 3.1415 (as compared to the rounded number, 3.1416).

If you think you can truncate numeric values by applying SAS w.d format, think again.

Try running this SAS code:

data _null_;
   x=3.1415926;
   put x= 6.4;
run;

If you expect to get x=3.1415, you will be disappointed. Surprisingly, you will get x=3.1416, which means that SAS format does not truncate the number, but rounds it. Same is true for the DOLLARw.d and COMMAw.d formats.

After running into this problem, I thought to instead use a SAS function to truncate numbers. The TRUNC function comes to mind. Indeed, if you look up the SAS TRUNC function, you will find that it does truncate numeric values, but (surprise!) not to a specified number of decimal places; rather it truncates to a specified number of bytes, which is not the same thing for numerics. This may be useful for evaluating the precision of numeric values, but has no direct bearing on our problem of truncating numeric values to a set number of decimal places.

Read More »

Post a Comment

Fun with SAS Text Analytics: A qualitative analysis of IALP papers

Fun with Text AnalyticsLast week, I attended the IALP 2016 conference (20th International Conference on Asian Language Processing) in Taiwan. After the conference, each presenter received a u-disk with all accepted papers in PDF format. So when I got back to Beijing, I began going through the papers to extend my learning. Usually, when I return from a conference, I go through all paper titles and my conference notes, then choose the most interesting articles and dive into them for details. I’ll then summarize important research discoveries into one document. This always takes me several days or more to complete.

This time, I decided to try SAS Text Analytics to help me read papers efficiently. Here’s how I did it.

My first experiment was to generate a word cloud of all papers. I used these three steps.

Step 1: Convert PDF collections into text files.

With the SAS procedure TGFilter and SAS Document Conversion Server, you may convert PDF collections into a SAS dataset. If you don’t have SAS Document Conversion Server, you can download pdftotext for free. Pdftotext converts PDFfiles into texts only, you need to write SAS code to import all text files into a dataset. Moreover, if you use pdftotext, you need to check if the PDF file is converted correctly or not. It’s annoying to check texts one by one and I hope you look for smart ways to do this check. SAS TGFilter procedure has language detection functionality and language of any garbage document after conversion is empty rather than English, so I recommend you use TGFilter, then you can filter garbage documents out easily with a where statement of language not equal to ‘English.’

Step 2: Parse documents into words and get word frequencies.

Run SAS procedure HPTMINE or TGPARSE against the document SAS dataset, with stemming option turned on and English stop-word list released by SAS, you may get frequencies of all stems.

Step 3: Generate word cloud plot.

Once you have term frequencies, you can either use SAS Visual Analytics or use R to generate word cloud plot. I like programming, so I used SAS procedure IML to submit R scripts via SAS.

These steps generated a word cloud with the top 500 words of 66 papers. There were a total of 87 papers and 21 of them could not be converted correctly by SAS Document Conversion Server. 19 papers could not be converted correctly by pdftotext.

fun-with-sas-text-analytics

Figure-1 Word Cloud of Top 500 Words of 66 Papers

Read More »

Post a Comment