Using ODS EXCEL and PROC EXPORT to bundle Excel-based reports

I recently wrote about my foray into the experimental world of ODS EXCEL. Using this new destination in SAS 9.4, I was able to create multisheet workbooks -- containing reports and graphs -- in the XLSX format. I outlined some of the benefits -- such as being part of Base SAS and its ability to create native XLSX files. But I also pointed out some of the shortcomings that I experienced, such as slow performance with large amounts of data.

I'm working on a consulting project in which I needed to provide my client with an Excel-based report. This report requires an easy-to-read summary sheet, such as might be created with PROC MEANS and PROC TABULATE. That's a perfect job for ODS EXCEL. But I also needed to include subsequent sheets that contained different versions of detailed data -- tables with lots of columns and many thousands of rows. In my experience, this can bog down ODS EXCEL; I find that PROC EXPORT with DBMS=XLSX performs much better. So...I decided to see if I could combine the two approaches to create a single file.

The process worked perfectly for my purposes. It relies on the following behaviors/assumptions:

  • ODS EXCEL creates a new XLSX file. That's okay, because each time I run my process I want to replace the XLSX that I had before. If I wanted to retain previous versions, I could tack a date suffix onto the target file name.
  • PROC EXPORT with DBMS=XLSX can add/update a sheet within an existing XLSX file. That's a relatively recent feature, added in SAS 9.4 (and perhaps worked in SAS 9.3M2). This means that I can create the file with ODS EXCEL, then update the same file using PROC EXPORT, all within a single SAS program. (Remember, PROC EXPORT with DBMS=XLSX requires SAS/ACCESS to PC Files.)

I would like to show an example of the output, but I'm sure that my customer wouldn't approve of me featuring their data details on my blog. So instead I'll risk alienating a different group of acquaintances: my Facebook friends.

Years ago I built an app that turns your Facebook friend data into a SAS program. Even though eons have passed (in social media time), the app still works. (You can try it yourself: it runs as a custom task in SAS Enterprise Guide or as a standalone program on your Windows PC.) I ran it just now to gather my latest Facebook friend data and create SAS data sets with my friend details. Then I used ODS EXCEL to create a summary sheet, and PROC EXPORT to create several detailed sheets. Here's an example of the summary:

And then one of the detailed sheets, produced by PROC EXPORT of one of the detailed data sets my Facebook program created:

Here's the SAS program that produces this output. First, I run the program that is generated by my Facebook app. Then I follow up with this program that generates the Microsoft Excel spreadsheet file.

/* Create a new FB summary and report workbook for this snapshot 
   of Facebook data                                             */
filename fbout "c:\temp\FBReport_&SYSDATE..xlsx";
/* A little ODS style trick to make headings in my sheet */
ods escapechar='~';
%let bold_style=~S={font_size=12pt font_weight=bold}~;
/* CREATES a new XLSX file */
ods excel (id=fb) file=fbout
  /* choose a style you like */
  /* SHEET_INTERVAL of NONE means that each PROC won't generate a 
     new sheet automatically                                     */
  options (sheet_interval="none" sheet_name="Summary")
ods noproctitle;
ods text="&SYSDATE. Friend Report for &myFacebookName";
proc sql;
     select count(distinct(UserId)) as 
         Number_Of_Friends into: NumberOfFriends
	 from friends;
ods text="&bold_style.Count of friends by gender";
proc freq data=frienddetails
	tables gender / 
ods text="&bold_style.Calculated Ages based on Graduation years";
proc means data=ages
	min max mean median p99;
	var age;
	class how;
ods graphics on / width=800 height=300;
ods text="&bold_style.Count of friends by Relationship Status";
proc freq data=frienddetails
	tables relationshipstatus / 
	scores=table plots(only)=freqplot;
ods excel (id=fb) close;
/* ADDS new SHEETS to the existing XLSX file */
proc export data=frienddetails
  outfile=fbout replace;
  sheet="Friend Details";
proc export data=schoolfriends
  outfile=fbout replace;
proc export data=statusprep(keep=name date message)
  outfile=fbout replace;
  sheet="Latest Status";

I could achieve a similar result using other methods, of course. But I like to take advantage of my consulting opportunities to explore new features in SAS software. I find that if I can learn a new feature "on the job", I can produce a good result for my customer while also adding to my bag of SAS tricks.

Related articles

Experimenting with ODS EXCEL to create spreadsheets from SAS
How do I export from SAS to Excel files: Let me count the ways
Running PROCs on Your Facebook Friends (2011 version)

Post a Comment

Experimenting with ODS EXCEL to create spreadsheets from SAS

The ODS ExcelXP tagset has served us well over the years. It provides a reliable method to get formatted SAS output into Microsoft Excel workbooks, where the business world seems to like to live. And it's available in Base SAS, which means that you don't need SAS/ACCESS to PC Files and any extra setup to begin using it.

In SAS 9.4 Maintenance 1, ODS EXCEL was introduced as an experimental feature. Even though it still has the "experimental" label in the recently released SAS 9.4M2, I've found it to be a useful addition to the many ways I can create Excel content from SAS. ODS EXCEL offers a couple of big advantages over ExcelXP:

ODS EXCEL produces a native XLSX file. Users of the ExcelXP tagset know that Excel complains about the file format as you open it. And the XML Spreadsheet format that it uses is uncompressed, resulting in potentially very large files.

ODS EXCEL supports SAS graphics in the output. ExcelXP users have come up with creative methods to insert graphs "after the fact", but it's not as convenient as a "once and done" SAS program. With ODS EXCEL, graphics from SAS procedures are automatically included in output.

Syntax-wise, ODS EXCEL is similar to ODS TAGSETS.ExcelXP. Chevell Parker shares many of those details in his SAS Global Forum 2014 paper; in absence of official doc for this experimental feature, Chevell's paper is essential.

Let's dive into an example. The following program looks similar to how you might use the ExcelXP tagset, but can you see the key differences?

ods excel file="c:\projects\output\example.xlsx" 
 /* will apply an appearance style */
  /* for multiple procs/sheet */
  /* name the sheet tab */
  sheet_name="CARS summary"
/* add some formatted text */
ods escapechar='~';
ods text="~S={font_size=14pt font_weight=bold}~Cars Summary and Histogram";
/* tabular output */
proc means;
var msrp invoice;
/* and a graph */
ods graphics / height=400 width=800 noborder;
proc sgplot;
histogram msrp;
ods excel close;

Note how I've targeted an XLSX file directly, instead of going through an XML intermediary file to placate Excel with the file format. And I included a graph with no problem. Like other ODS output destinations, I can apply styles and special formatting as needed. Here's a screen shot of the resulting spreadsheet:


I have a few additional observations from playing with this feature (did I say it was experimental?):

  • ODS EXCEL doesn't perform well with large detailed output. That is, if I use ODS EXCEL and PROC PRINT a data set with lots of columns and many thousands of rows, it's going to be slow -- and might hit a wall with memory usage. For that use case, PROC EXPORT DBMS=XLSX works better (though that requires SAS/ACCESS to PC FILES).
  • ODS EXCEL overwrites any existing Excel file that you target. That means that you aren't going to use this method to poke new values into an existing spreadsheet, or add sheets to an existing workbook. Compare that to PROC EXPORT DBMS=XLSX, which allows you to update an existing workbook by targeting just one sheet.
  • My example code works perfectly on the SAS University Edition! Simply change the destination file to be a location that SAS can write to -- for example, file="/folders/myfolders/example.xlsx".

If you want to learn more about the new ODS destinations in SAS 9.4 (which include HTML5, EPUB, POWERPOINT, and EXCEL), watch my interview and demo with David Kelley, R&D manager for ODS at SAS:

Post a Comment

Format your data like a social media superstar

In my industry of data and computer science, precision is typically regarded as a virtue. The more exact that you can be, the better. Many of my colleagues are passionate about the idea, which isn't surprising for a statistical software company.

But in social media, precision is a stigma -- especially when applied to the number of followers or connections that you have. Do you have so few connections that it can be represented as an exact number? Well, I wouldn't go bragging about that!

For example, let's look at this new LinkedIn joiner (blurred to protect his identity). As of this writing he has only -- and exactly -- 15 connections! (For the record, I consider this guy to be a rising star; I'm sure that his connections will skyrocket very soon.)

LinkedIn newcomer
Compare this to a more seasoned networker, shown below. LinkedIn won't tell us how many connections he has; perhaps they are beyond measure. We can see only that he has 500 or more LinkedIn connections. It might be 501 or it might be 500 million. It doesn't matter, according to LinkedIn. All you need to know is that this guy is a highly sought-after connection.

Let's create a SAS format to count connections, LinkedIn-style. It's simple: for cases of 500 or more, the label should be "500+". For fewer than 500, the label will be the embarrassingly precise actual number.

libname library (work); /* "library" is automatically searched for formats */
proc format lib=library;
 value linkedin    
  500 - high ='500+'
  /* all other values fall through to actual number */
/* test data */
data followers;
  length name $ 40 followers 8 displayed 8;
  format displayed linkedin.;
  infile datalines dsd;
  input name followers;
  displayed = followers;
Chris Hemedinger, 776
Kevin Bacon, 100543
Norman Newbie, 3
Colleen Connector, 499

Here's the result. Notice how the "displayed" value acts as sort of an equalizer among the super-connected. Once you've achieved a certain magnitude of connections, you're placed on par with all other "superstars".

Social sites like Instagram and Twitter operate on a different scale. While they still sacrifice precision when displaying a large number of "likes" or "followers", they do at least present a ballpark number that shows the magnitude of the activity. For example, instead of showing the exact follower count for Ronda Rousey (public figure, film star, MMA champ, and daughter of an experienced SAS presenter), Instagram simply truncates the number to the thousands and adds a "K".

To achieve the same in SAS, we can rely on a PICTURE format. This format contains the templates for what a number should look like, plus the multipliers to apply to get to the figure we want to display. Here's an example program, which I repurposed from the BIGMONEY example in SAS documentation.

proc format lib=library;
picture social (round)
 1E03-<1000000='000K' (mult=.001  )
 1E06-<1000000000='000.9M' (mult=.00001)
 1E09-<1000000000000='000.9B' (mult=1E-08)
 1E12-<1000000000000000='000.9T' (mult=1E-11);
/* test data */ 
data likes (keep=actual displayed);
format actual comma20.
 displayed social.;
 do i=1 to 12;
  displayed = actual;

Here's the result:

As you can see, the "displayed" values show you the magnitude of big "likes" in a cool-but-not-so-eagerly-precise way.

Here's my complete test program if you want to try it yourself. Happy networking!

Post a Comment

How to find out what's new in SAS 9.4 maintenance

Today SAS began shipping the second maintenance release of SAS 9.4, colloquially known as "SAS 9.4M2". This is an incremental release, meant to build upon the already-solid SAS 9.4 platform. Even as a SAS insider, I cannot possibly keep track of every new feature that has been added in maintenance. But since I'm always on the hunt for interesting things to blog about, I find myself scanning the "What's New in SAS 9.4" document.

If you're coming from SAS 9.3 or earlier, then the entire "What's New" document is of interest to you. But if you're already using SAS 9.4 or even SAS 9.4 Maintenance 1, then you might be looking for the "9.4M2 Cliffs Notes". I'm afraid that the What's New document isn't structured that way -- it's organized by product and feature sets, and not chronologically by maintenance releases.

Fortunately, the document does call out the new features that have been added in each maintenance within the context of each product, and I've found a technique that helps me to find the highlights for M2. Here's what I do.

1. Visit the What's New section of From here, I click on What's New in SAS 9.4 (HTML version).

2. From the Contents section, select the product of interest. For example, I'm very interested in Base SAS. Each release brings new statements, functions, ODS enhancements, and new statistical graphics -- all under the huge umbrella of "Base".

3. Use the browser "Find" feature to locate the word "second". The writers and editors at SAS are very consistent with their terminology standards, and that's a big help for me here. I know that in this What's New document, I can find the newest features by locating what's called out for "the second maintenance release for SAS 9.4". Simply searching for "second" helps me to find these.

For example, I learned that SAS 9.4 offers enhancements to the new ODS EPUB and ODS HTML5 destinations. I'm interested in those because David Kelley demonstrated them at SAS Global Forum 2014.

The What's New in SAS 9.4 document contains only the highlights, but provides links to the documentation with more details. For example, the mention about the ODS HTML5 enhancements links to the SAS 9.4 Output Delivery System: User's Guide, which contains a more detailed view of what's new in ODS. From there, I can use the same browser trick to find mentions of "second" to see what was added to ODS in SAS 9.4M2.

The analytical products (such as SAS/STAT, SAS/IML, and SAS Enterprise Miner) make this a little bit easier by issuing new version numbers. The SAS 9.4M2 release of SAS/STAT is versioned as 13.2, and those new features are discussed in the SAS/STAT 13.2 section of the SAS 9.4 What's New document, with more details in the SAS/STAT documentation.

Post a Comment

More SAS tips coming to your inbox

When it comes to e-mail-based newsletters, I'm of two minds. On one hand, I feel like I receive enough (or maybe too much) e-mail and I'm reluctant to clog up my inbox with more stuff -- especially if it's information that's located elsewhere (such as on that big backup drive that we call "the Internet").

tipsextra_conceptBut on the other hand, I don't usually have time to go searching the Web for stuff that might someday be useful to me, or even to pay regular visits to the sources that I know will have good information (such as blogs and discussion forums). If someone collected some of the best tidbits from these sources and delivered them to me -- even in e-mail -- I think that would be useful. Do you agree?

We know from experience that SAS users love to see tips about how to do more with SAS. So we got to thinking: What if we curated an e-mail newsletter that was ALL SAS tips? Where all of the content was coming from the "hey, we bet you didn't know this" category?

We're making that happen, beginning this month. It's called SAS Tech Report: Tips Extra. It contains tips -- some from SAS staff and some from the larger SAS community -- to save you time and expand your mind. If you subscribe to the SAS Tech Report newsletter already, then do nothing -- you'll receive it automatically. If you don't yet subscribe, then visit the e-Newsletters section of and click the link to sign up. It's free, of course -- all you need is a SAS profile (which you probably already have if you participate in or interact with SAS Technical Support).

I hope that you like the content, and that you appreciate it showing up in your inbox. And if you have feedback or ideas for topics to feature, let me know here in the comments.

Post a Comment

Performance tips for viewing data in SAS Enterprise Guide

SAS users love to look at data. And the data grid in SAS Enterprise Guide is a convenient way to view the contents of a data set. While small data sets can be rendered lickity-split for quick viewing, sometimes people get justifiably anxious when opening very large data. Perhaps they've been burned with a bad experience, because opening a huge transaction table can hang up a system. When dealing with very large data, it's wise to be cautious.

In general, the SAS Enterprise Guide data grid can open and show data quickly, even if the data sets are very large. This is because the data grid does not read in all of the data records at once, but reads in only the records that are needed to fill the rows of the grid. The data grid uses a "virtual mode", which allows you to scroll quickly around your data set, no matter the size.

In addition to the scroll bar, which shows you approximately where you are in the data set, there are a few keyboard shortcuts that can help you to navigate:

  • Ctrl+End - takes you to the last record, last column in the data set.
  • Ctrl+Home - takes you to the first record, first column in the data set.
  • Ctrl+G - brings up a "Go to" window, in which you specify exactly which record number to bring to the top of the view.

Note: The concept of "record number" is a specific to SAS data sets, since traditional SAS files store records sequentially. When you view a database table (accessed through a SAS/ACCESS library), the idea of "record number" might not make as much sense, since the database rules determine the sequence. However, you can still use the keyboard shortcuts to navigate your view.

Even for all of its built-in smarts, the data grid can still perform slowly in certain situations. If you're experiencing slow performance, here are some tips to help.

Understand your network latency. SAS Enterprise Guide retrieves data records from your SAS session, which is typically on another machine in your network. If your network connection is slow or the remote machine is very remote (such as in another country), fetching those records will take more time.

Avoid opening data unnecessarily. When you add data to your project, whether using File->Open Data or by running a task that produces output data, SAS Enterprise Guide will automatically open the data for viewing. Simply opening the data can cause a noticable delay when there is network latency or a slow database connection. If you're already familiar with the data and don't need to see the records every time, you can change this default behavior. From Tools->Options, select Data General and uncheck the option for "Automatically open data when added to the project."

Avoid scrolling through very wide data. As I stated earlier, large data isn't generally a problem in SAS Enterprise Guide. However, when the data are very wide, the grid can be slow. Why? Because even though the data grid reads only enough data records to fill the grid, it reads only entire records -- even if some of the variables in the record aren't visible on the screen at the time.

I define "wide" data as hundreds or thousands of variables, or maybe fewer variables that each have very large lengths. Consider these two LENGTH statements, each of which allocate 2000 variables but yield dramatically different record lengths:

/* 2000 numeric vars, 8 bytes each = 16,000 bytes per record */
/* a data set with just these fields can render quickly in SAS EG */
length x1-x2000 8;
/* 2000 character vars, 1000 bytes each = 2,000,000 bytes per record! */
/* a data set with these fields can really slow things down */
length c1-c2000 $ 1000;

Use the Data Explorer with care. The Data Explorer (added in SAS Enterprise Guide 5.1) is a popular new feature (see Marje Fecht's enthusiastic review). The Data Explorer can provide at-a-glance "Quick Stats" of variable distribution and frequency. However, the behind-the-scenes analyses that support those stats can be expensive for very large data. SAS Enterprise Guide sets some automatic constraints in order to limit the impact, but you should set your expectations accordingly.

Understand the special treatment for database tables. When you access data in a library using a SAS/ACCESS engine (for example, connecting to Oracle or Teradata), SAS Enterprise Guide automatically limits the data grid view to 10,000 records. You can configure that number in the application options, but the reason for the constraint is to limit the impact on the database connection and to keep your DBA happy (well, as happy as DBAs tend to be, anyway). 10,000 records should be enough for you to "eyeball" the shape of the data. If you need to see rows that meet specific criteria, consider using the Query Builder.

To paraphrase King Burger's Bon Qui Qui (from the famous MAD TV sketch): You can view your data your way, but don't get crazy. I hope that with a little knowledge about what happens behind the scenes, you can click through your data with more confidence.

Post a Comment

New course: Developing Custom Tasks for SAS Enterprise Guide

This probably won't surprise any of my regular readers: "SAS custom tasks" is one of my favorite topics to talk about.

Since 2007, I've written blogs about how you can use custom tasks to extend SAS Enterprise Guide and the SAS Add-In for Microsoft Office. I've shared lots of examples, many that you can use simply "as-is" without any special technical knowledge. I presented a SAS Talks webinar on the topic. And I even wrote the book: Custom Tasks for SAS Enterprise Guide Using Microsoft .NET.

Now I'm ready to take this show on the road, and teach this topic in a classroom setting. I've built a new SAS course: Developing Custom Tasks for SAS Enterprise Guide.

I taught the course for the first time this past April. All new SAS courses go through a "test teach", in which internal SAS folks participate as students. The purpose of a test teach is to ensure that the course materials, software setup, and instructor are all up to the high standards for SAS Training offerings. In my test teach, I had students from Technical Support, R&D, Product Management, Education, and Professional Services. That's a pretty good mix, and I got some good feedback. At the end, we agreed that the course is ready for prime time.

Here are some high-level features that you can expect from the course:

  • It's a two-day course -- and we fill the time!
  • We cover the SAS APIs and toolkits that make custom tasks possible. We also cover best practices for design and build, which make it much easier to have successful custom task projects.
  • Students will get plenty of hands-on experience, some as instructor-led tutorials and some as independent lab exercises.
  • You will build a complete custom task, in class!
  • You don't need to be a SAS expert or Microsoft .NET expert coming in, but I recommend that you have base SAS programming experience and also experience with some object-oriented development tool, such as Microsoft Visual Studio (C# or Visual Basic) or Eclipse (with Java).
  • You should come having at least working knowledge of SAS Enterprise Guide.

If all of this sounds interesting to you, I hope to see you in a SAS training center in the near future! Check the SAS Training & Books web site for schedules and locations. (Note, this link is from the USA course catalog. Other countries might add the course later, based on interest. Hint, hint.)

Post a Comment

Add a stars-style rating column to your SAS output

Now, the Star-Belly Sneetches
Had bellies with stars.
The Plain-Belly Sneetches
Had none upon thars.
- from "The Sneetches", by Dr. Seuss

Recently a user on the SAS-L mailing list had this challenge: "I would like to display stars in a table (created by PROC REPORT) based on variable values. For example, if value=3, then display five stars with 3 in yellow, and the other two in grey."

In her original post, the user had sketched out an approach that used a custom SAS format with uppercase and lowercase Xs representing the desired "star" states. It turns out that this was very close to a working approach. All she needed to do was apply the same technique that I used to add Harvey Balls to a SAS report. Like the Fix-It-Up Chappie, I'm here to offer the solution.

In my Harvey Balls example, I used ODS ESCAPECHAR and Unicode character codes to add extended characters to my SAS format labels. There is a Unicode character for a star (HEX 2605), so the same approach can work. In addition, the ESCAPECHAR sequence can set off style instructions, such as a color definition. This allows you to control the character color "mid-stream". For example, this sequence specifies a red star followed by a gray star:

 ~{style [color=red] ~{unicode '2605'x}} ~{style [color=ygr] ~{unicode '2605'x}}

As it happens, I keep some movie rating data that I downloaded from my Netflix account history (a very rich source of trivial SAS examples). When I apply a custom "star" format to my data, I can produce a report like this:
Here's my SAS format and the PROC PRINT code that references it. The format and report look perfect in HTML, PDF and RTF output:

/* ODS EXCAPECHAR needed to set style/unicode cues */
ods escapechar='~';
/* Captured these in macro variables for readability and */
/* easy maintenance                                      */
%let graystar = ~{style [color=ygr] ~{unicode '2605'x}};
%let redstar =  ~{style [color=red] ~{unicode '2605'x}};
/* using a 4-star system where 1 means "zero stars" */
/* and 5 means the full "4 stars"                   */
proc format lib=work;
value stars
 1 = "&graystar.&graystar.&graystar.&graystar."
 2 = "&redstar.&graystar.&graystar.&graystar." 
 3 = "&redstar.&redstar.&graystar.&graystar." 
 4 = "&redstar.&redstar.&redstar.&graystar." 
 5 = "&redstar.&redstar.&redstar.&redstar." 
title "Movie titles with STARS on THARS";
proc print data=work.movies noobs;
  format rating stars.;

If you want to try it yourself, you can download my program with test data:

>> Complete SAS program with test data

This program should work in SAS Display Manager, SAS Enterprise Guide (select HTML output), and even the SAS University Edition (SAS Studio).

Post a Comment

Cage match: SAS Studio versus SAS Enterprise Guide

I wish I had a nickle for every time I heard this question at SAS Global Forum:

"So, does this SAS Studio thing replace SAS Enterprise Guide?"

SAS Studio is a pretty big deal. It's groundbreaking in several ways:

  • It's a web-based programming interface to SAS. It runs in your browser, which means that end users don't have to install anything (when connecting to a remote SAS session).
  • It's an HTML5-based application, so there are no browser plugins needed. It runs on Windows, Macs, and even the iPad.
  • It's the basis for new offerings from SAS, most notably the SAS University Edition. This offering is free to just about any learner for non-commercial use. The SAS University Edition includes SAS, running in a virtual machine, packaged with SAS Studio as the user interface. Since its launch earlier this week, people have been downloading it like crazy.

You're going to be hearing a lot about SAS Studio. It was even the theme for this month's SAS Tech Report.
If you haven't seen SAS Studio, take a few minutes and watch my SAS Tech Talk interview with Shannon Smith, the SAS R&D testing manager for the product:


So what does this mean for those of us who have invested our skills and processes in SAS Enterprise Guide? If you read this blog regularly, you know that includes me! Does this "new app on the block" replace our beloved SAS Enterprise Guide? The answer is No -- and Yes.

No, SAS Studio isn't a direct SAS Enterprise Guide replacement. SAS Enterprise Guide continues to get new features, mostly targeting productivity enhancements and integration with other SAS offerings, such as SAS Visual Analytics. Many thousands of users around the world use SAS Enterprise Guide to manage process flows, reporting and analytics, database access, and custom processes. SAS Studio doesn't have all of that infrastructure (at least, not yet), and cannot step in to replace all of that.

But also, Yes: SAS Studio can replace some uses of SAS Enterprise Guide. If you use SAS Enterprise Guide simply as way to manage SAS programs in your SAS environment, then you can certainly use SAS Studio instead (or as well) to develop and maintain those programs. SAS Studio also includes some tasks for non-programmers, similar to those found in SAS Enterprise Guide -- but for now the library isn't as rich as what you'll find in SAS Enterprise Guide. And with the SAS University Edition, SAS Studio will represent the first SAS experience for the next generation of SAS programmers.

Sometimes SAS users ask me (usually in a hushed tone): Why does SAS create these different applications that seem to compete with each other? Is there some sort of contest in SAS R&D to see which teams can outdo the others? My answer: while these apps might have a certain amount of overlap, they really do serve different purposes and different audiences. Our goal is to enable SAS users -- regardless of discipline, industry, or expertise -- with the tools that are most fit for their particular purpose. One size does not fit all (though some diehard PC SAS fans might disagree with me).

Plus, here's another secret: the same developers have built all of these applications. The SAS Studio development team includes people who worked on SAS Display Manager (you know, "PC SAS") and SAS Enterprise Guide. This is a direct benefit of SAS being such a great workplace: nobody leaves. That means that the lessons learned from customers and developers are carried over and applied in each successive "app generation". If developers are competing, then they are mostly competing with the proven work they've done in the past. But since the teams always have new technology and techniques at their disposal, it's the end users who win.

Post a Comment

The ingredients of a Data Scientist

It was just a couple of years ago that folks were skeptical about the term "data scientist". It seemed like a simple re-branding of an established job role that carried titles such as "business analyst", "data manager", or "reporting specialist".

But today, it seems that the definition of the "Data Scientist" job role has gelled into something new. At SAS Global Forum 2014, I heard multiple experts describe data science qualifications in a similar way, including these main skills:

  • Ability to manage data. Know how to access it, whether it's in Excel, relational databases, or Hadoop -- or on the Web. Data acquisition and preparation still form the critical foundation for any data analysis.
  • Knowledge of applied statistics. Perhaps not PhD-level stuff, but more than the basics of counts, sums, and averages. You need to know something about predictive analytics, forecasting, and the process of building and maintaining analytical models.
  • Computer science, or at least some programming skills. Point-and-click tools can help keep you productive, but it's often necessary to drop into code to achieve the flexibility you need to acquire some data or apply an analysis that's not provided "out of the box".
  • And finally -- and this makes a Data Scientist the most relevant -- the ability to understand and communicate the needs of the business. You might be a data wiz and have metrics out the wazoo, but an effective data scientist must know which fields and metrics matter most to the organization he or she serves. And you must be able to ask the right questions of the stakeholders, and then communicate results that will lead to informed action.

I don't claim to be a data scientist -- I'm not strong enough in the statistical pillar -- but I do have my moments. For example, I consider my recent analysis of blog spam to be data-science-like. Even so, I'm not brave enough to change my business cards just yet.

At SAS Global Forum I talked to Wayne Thompson, Chief Data Scientist at SAS. (Yes, even SAS is capitalizing on the buzz by having a data science technologies team.) Here he is introducing SAS In-Memory Statistics for Hadoop, a programming interface that's meant to empower data scientists:

Wayne and I also talked a couple of other times: once about SAS Visual Statistics ("it's the shizzle", says the bald white guy -- not me), and once about data science in general.

Data science isn't all just "Wayne's world" -- there were plenty of other data science practitioners at the conference. For example, check out Lisa Arney's interview with Chuck Kincaid of Experis, talking about how to be a data scientist using SAS. (See his full paper here.) And SAS' Mary Osborne, who presented on Star Wars and the Art of Data Science. (Her paper reveals the unspoken fifth pillar of a data scientist: it's good to be part nerd.)

What do you think about the "new" field of data science? Have you changed your business card to include the "data scientist" title?

Post a Comment