Econometric modeling: your questions answered

Several weeks ago, I led a SAS Talks webinar on SAS/ETS emphasizing the many recent changes to the software. SAS/ETS, for those unfamiliar with the product, is SAS’s suite of econometrics, time series and forecasting tools and algorithms. While we covered a substantial amount of material in the talk, there is even more that I didn’t have time to share.

The first place to turn for information on the many recent enhancements to the product is the product documentation:

Those resources are great but may I suggest actually putting your hands on the software? As part of the new SAS Analytics U initiative, we have made our cloud version of SAS freely available to those wishing to use the tool for a learning or teaching purpose.

  • If you would like to access this tool, you can sign up for a free account. The next step is to actually submit code for yourself.
  • For those looking to get started with one of SAS’s newest procedures, try these PROC HPCDM code samples.
  • The SAS/ETS 13.2 User's Guide Sample library has fully self-contained examples of how to run all SAS/ETS procedures. Just copy and paste the information on the page into the SAS Studio browser and you are running SAS code in the cloud. Pretty cool!

There were several excellent questions which were asked during my SAS Talks webinar.  Here are some of the questions and responses.

Q: ­If one listens to the hype one might conclude that with "Big Data" tools, time series and explanatory econometrics are no longer needed. Please comment.

A: This is a great question and one that probably deserves a full post. If you would like to hear my take on the importance of econometric thinking and the dangers of simply worrying about prediction, feel free to watch Econometrics--The Question of "Why?".

Q: Are multiple frequency time series equivalent to multiple (multivariate) time series?

A: A time series with multiple frequencies are different from multivariate time series arising from the same system however the storage of these time series might make them look similar. In both cases, to store these values in one data set, additional columns must be created. In the case of multivariate time series it would not be anticipated that the resulting data set have missing values however, in the case of multiple frequency data, it would be expected.

Q: How do we know what version of SAS/ETS we have?

A: An easy way to do this is to look at the log file immediately after your SAS session begins. The versions are listed. An alternative way is to run this code.

proc product_status;
run;

It will list all your currently licensed products and their corresponding versions.

Q: Can you share the code to estimate a count data regression model with household specific fixed effects from the doctor visit example?

A. Sure.  Notice that the only difference in syntax is the groupid= option on the proc statement.

proc countreg data=a groupid=panelid;
   model visits = x1 x2 x3 / dist= poisson;
run;

Thanks so much and I look forward to sharing highlights from future releases with you.

Ken

Post a Comment

SAS scalability: 5 concepts you should know

Scalability is the key objective of high-performance software solutions. “Scaling out” is a concept which is accomplished by throwing more server machines at a solution so that multiple processes can run in dedicated environments concurrently. This blog post will briefly touch on several scalability concepts that affect SAS.

Functional roles

scalability1At SAS, we have a number of different approaches to tackle the ability to scale our software across multiple machines. As we often see with our SAS Enterprise Business Intelligence solution components, we’ll split up the various functional roles of SAS software to run on specific hosts. In one of the most common examples, we’ll set aside one machine for the metadata services, another for the analytic computing workload, and a third for web services.

While this is more complicated than deploying everything to a single machine, it allows for a lot of flexibility in providing responsive resources which are optimized for each role. Now, we’re not limited to just three machines, of course.

Read more:
SAS® 9.4 Intelligence Platform: Overview

Clusters

scalability2For each of these functional roles – Meta, Compute, and Web – we can scale them out independently of the others. Depending on the technology involved, different techniques must be employed. The Meta and Web functional roles, in particular, are well-equipped to function as clusters.

Generally speaking, a software cluster is comprised of services that present as peers to the outside world. They offer scalability and improved availability where any node of the cluster can perform the requested work, continue to offer service in the face of failure of one or more nodes (depending on configuration) and other features.

Read more:

Grids

scalability_gridThe Compute functional role has some built-in ability to act as a cluster if the necessary SAS software is licensed and properly configured – which is pretty great already – but this ability can be extended even further to act as a grid. A grid is a distributed collection of machines that process many concurrent jobs by coordinating the efficient utilization of resources which may vary from host.

With proper implementation and administration, grids are very tolerant of diverse workloads and a mix of resources. For example, it’s possible to inform your grid that certain machines have certain resources available and others do not. Then, when you submit a job to the grid, you can declare parameters on the job that dictate the use of those resources. The grid will then ensure that only machines with those resources are utilized for the job. This simple illustration can be implemented in different ways depending on the kind of resources and with a high-degree of flexibility and control.

Another common component of clusters and grids is the use of a clustered file system. A clustered file system is visible to and accessed by each machine in the grid (or cluster) – typically at the exact same physical path. This is primarily used to ensure that all nodes are able to work with the same set of physical files. Those files might range from shared work product to software configuration and backups, event to shared executable binaries. The exact use of the clustered file system can of course vary from site to site.

Read more:

Massively Parallel Processing

scalability4Extending grid computing even further is the concept of massively parallel processing (or MPP). As we see with Hadoop technology and the SAS In-Memory solutions, a number of benefits can be realized through the use of carefully planned MPP clusters.

One common assumption behind MPP (especially in the implementation of the SAS In-Memory solutions) has historically been that all participating machines are as identical as possible. They have the same physical attributes (RAM, CPU, disk, network) as well as the same software components.

The premise of working in an MPP environment is that any given job (that is, something like a statistical computation or data to store for later) is simply broken into equal size chunks that are evenly distributed to all nodes. Each node works on the problem individually, sharing none of its own CPU, RAM, etc. with the others. Since the ideal is for all nodes to be identical and that each gets the same amount of work without competing for any resources, then complex workload management capabilities (such as described for grid above) are not as crucial.  This assumption keeps the required administrative overhead for workload management to a minimum.

Read more:

Hadoop and YARN

Looking forward, one of the challenges of assuming dedicated, identical nodes and equal-size chunks of work in MPP has been that it’s actually quite difficult to keep everything equal on all nodes all of the time. For one thing, this often assumes that all of the hardware is exclusive for MPP use all of the time – which might not be desirable for systems which sit idle overnight, on weekends, etc. Further, while breaking workload up into equal-size bits is possible, it’s sometimes tough to keep the workload perfectly equal and distributed when there exists competition for finite resources.

For these and many other reasons, Hadoop 2.0 introduces an improvement to the workload management of a Hadoop cluster called YARN (Yet Another Resource Negotiator).

The promise of YARN is to better manage resources in a way accessible to Hadoop as well as various other consumers (like SAS). This will help mature the MPP platform, evolving it from the old Map-Reduce framework to a more flexible platform to handle a wider variety of different workload and resource management challenges.

And of course, SAS solutions are already integrating with YARN to take advantage of the capabilities it offers.

Read more:

 

Post a Comment

Your SAS Global Forum 2015 draft kit

footballIt’s my favorite time of the year, draft time!  NFL and Fantasy Football fans, I don’t mean THAT draft, but similar.  It’s what I will call the #SASGF15 draft!

The time of year when the best and the brightest, the most knowledgeable, passionate, and inspirational SAS users submit ideas around various topics, hopefully, to be chosen to present in the Pro Bowl of SAS Users, SAS Global Forum. This time around, this premier event is being held in Dallas, Texas from April 26-29, 2015.
Read More »

Post a Comment

The power of SAS-generated InfoWindows in Google maps

In my prior posts on displaying Google maps in SAS, I used Google map InfoWindows to display text information when user clicks on a marker or area of the map.  Thanks to reader Tom Bellmer’s questions, today I am going to explore some additional possibilities that the Google map InfoWindow can provide.

According to the Google Maps documentation, the content of the Info Window may contain a string of text, a snippet of HTML, or a DOM element. Let’s use that “snippet of HTML” creatively.

I am going to expand the earlier example, Live Google maps in SAS -- multiple markers by enhancing the InfoWindow so that, in addition to text, it includes hyperlinks, images and even embedded YouTube video. There are just two differences of this code from the prior implementations of the Google map with markers and info windows. These differences are: 1) the data preparation step shown below and 2) the line of the code that defines the contents of the InfoWindow. This example also uses a simpler method for handling quotes. Read More »

Post a Comment

Transitioning to 64-bit SAS on Windows

The major benefit of 64-bit applications is that they no longer have the memory limitation imposed by their 32-bit predecessors. This is why many SAS customers are making the transition from 32-bit SAS to 64-bit SAS. The move to 64-bit SAS can be daunting at first sight. There are many questions which quickly come to mind:

  • What are the benefits of 64-bit?
  • Will my existing 32-bit data work with 64-bit SAS?
  • What problems might I encounter?

Here are some resources which might answer your questions and ease your transition: Read More »

Post a Comment

Drawing overlays on SAS-generated Google maps

In this post, I am going to expand on my prior posts, Spice up SAS output with live Google maps and Live Google maps in SAS -- multiple markers to explore some new functionality.

Using similar techniques, I'll demonstrate how to draw a closed geometric shape, in particular, a polygon, on a Google map. Well, let’s make it more than one polygons – two for simplicity. Overlaying polygons on a Google map is a great visualization for geographical and administrative regions, such as countries, states, counties, school districts, areas of services, etc. Besides, practically any shape can be approximated by a polygon with a large enough number of vertices. Here's an example what we are going to get: Read More »

Post a Comment

SAS Global Forum 2015—the place to learn more about analytics

Ever wondered where to find analytics experts to get your countless questions answered or where to find qualified talent to grow your industry? The simple answer to these questions is just one event – SAS Global Forum 2015 in Dallas, TX April 26-29.

Every year, the SAS Global Users Group plans and sponsors SAS Global Forum where SAS’s cutting edge technology gets showcased along with dozens of workshops, presentations, demos and networking opportunities. This is also the place you can learn how industry giants are using hot technologies such as visual analytics, Hadoop, in-memory computing and more.

Now, supersized datasets with billions of rows are the norm rather than the exception. Equally important, the variety of data being analyzed – from numeric to text to audio to video – has grown tremendously. Where can you see cool applications using such diverse data for making better decisions? From beginner to expert, from academic to commercial – SAS Global Forum 2015 has something to offer every skill level and application. Read More »

Post a Comment

SAS Visual Analytics 6.4: Importing a Twitter Stream

Great news.  If you’ve been struggling to import a Twitter stream as a data source, SAS Visual Analytics 6.4 has greatly simplified that task as part of this release’s expanded data import functionality. The first time you import tweets, you are directed to the Twitter website to log on to your account and authorize SAS Visual Analytics. After the initial logon, SAS Visual Analytics uses authorization tokens for accessing Twitter instead of requiring you to log on each time.

 The product documentation provides high level instructions for how to import tweets from Twitter, but I found that additional detail makes the process much simpler to follow.  In this post, I’ll walk you through the process from beginning to end, with screenshots and helpful hints along the way. Read More »

Post a Comment

8 most attended SAS administrator papers in 2014

In a comment on last week’s blog asking SAS administrators: please submit your paper idea for SAS Global Forum 2015, Andrew Howell of ANJ Solutions asked if I had any statistics on which were the most popular SAS administrator papers for last year’s conference. He suggested the following nominations although he was “only able to attend about half of these presentations—there was just so much to see!” 

I don’t have any readily available statistics on downloads for these papers, and, like Andrew, I wasn’t able to attend all of the excellent SAS Administrators sessions at SAS Global Forum 2014. But based on my own observations and feedback from others who were able to attend, here’s my very subjective ranking of Andrew’s list!

  1. Effective Usage of SAS Enterprise Guide in a SAS 9.4 Grid Manager Environment, Edoardo Riva, SAS
  2. SAS Grid – What They Didn’t Tell You, Manuel Nitschinger, sIT Solutions and Phillip Manschek, SAS
  3. Best Practices for Implementing High Availability for SAS 9.4, Cheryl Doninger, Zhiyong Li and Brian Wolfe, SAS
  4. SAS Installations: So You Want To Install SAS?, Rafi Sheikh, Analytics International
  5. Top 10 Resources Every SAS Administrator Should Know About, Margaret Crevar and Tony Brown, SAS
  6. SAS Grid Manager I/O: Optimizing SAS Application Data Availability for the Grid, Gregg Rohaly and Harry Seifert, IBM
  7. Test for Success: Automated Testing of SAS Metadata Security Implementations, Paul Homes, Metacoda
  8. Integrating Your Corporate Scheduler with Platform Suite for SAS® or SAS® Grid Manager, Paul Northrop, SAS Australia

Last year’s conference was a great opportunity for “seeing SAS Administrators in their natural habitat”.  Many of these sessions were standing room only.  Please submit your paper idea and let’s plan for another great year for administrators!

Post a Comment

Macro quoting made easy

Are there times when you need to pass special characters to a macro variable but cannot find the right technique to accomplish the task?  In this article I’ll discuss the different macro quoting functions and give a simple technique to help you determine which macro quoting function to use.

Why do we need macro quoting?  The SAS macro language is a character-based language. With macro, you can specify special characters as text.  Some of these special characters (for example, a semicolon or percent sign) are part of the SAS language instruction set, so we need a way for the macro processor to interpret a particular special character when it’s being used as text in the macro language.  Macro quoting functions tell the macro processor to treat these special characters as text rather than as part of the macro language.  Without macro quoting functions, you would have no way to mask the real meaning of these special characters or mnemonics.

This post will list some all-purpose functions, tell how to determine when to use each type, and show you how to unmask, or unquote special characters. Read More »

Post a Comment