SAS administration and architecture—highlights from SAS Global Forum 2015

I was privileged with the opportunity to present a couple of papers at SAS Global Forum 2015 in Dallas, Texas this year.  While there, I was also excited to attend presentations with new and inventive approaches for working with the administration and architecture of SAS solutions. This is a collection of just a few of my favorites.

SAS 9.4 MIDDLE TIER

The new middle tier technologies introduced with the release of SAS 9.4 have significantly improved the performance, scalability, and availability of our web services.

GENERAL PLATFORM AND STORAGE I/O

Every year we get updates to these venerable whitepapers that are very helpful in explaining how SAS interacts with platform technologies such as CPU architectures, operating systems, hard drives, flash drives, high-performance storage solution technologies, and much more.

SAS 9.4 GRID MANAGER

The SAS Grid Manager solution is a feature-rich and diverse technology set to help customers manage large workloads, improve infrastructure availability, increase resource utilization, and provide flexibility to the IT infrastructure – and oh yeah, deliver higher levels scalable performance for SAS solutions, too.

  • SAS1968-2015: The Truth Behind the Most Comment Myths for SAS Grid Manager (Margaret Crevar, Glenn Horton, and Doug Haigh, SAS Institute Inc.) SAS Grid Manager offers so many capabilities, it’s easy to get lost in it all. Important concepts are often poorly understood or sometimes forgotten. This paper pulls back the curtain on many of those topics, explaining that there’s no “magic” to it, just good technology we all need to comprehend.
  • SAS1897-2015: Planning for the Worst—SAS® Grid Manager and Disaster Recovery (Glenn Horton and Doug Haigh, SAS Institute Inc.)  If you’re new to planning for disaster recovery, there might be a lot of questions you don’t know to ask. This paper defines many of the major concepts and addresses the considerations you need to make in light of a SAS Grid Manager solution.

CLOUD AND VIRTUALIZATION

SAS has big plans for pushing our analytic offerings to the cloud. Virtualization plays a major role in that effort.

  • SAS1947-2015: SAS® vApps 101 (Danny Hamrick, Gary Kohan, Connie Robison, Rob Stephens, and Peter Villiers, SAS Institute Inc.)  SAS vApps (short for virtual applications) provide a way for us to deliver complete solution deployments which can range in size from a single software service all the way to multiple tier virtual machine implementations needing only minimal configuration to get up and running in a customer environment. This informative paper covers design objectives and decisions, technical and business benefits, as well as the lifecycle maintenance of SAS vApps.
  • Paper 2882-2015: The Advantages and Pitfalls of Implementing SAS® in an Amazon Web Services Cloud Instance (Jeff Lehmann, Slalom Consulting, LLC)  A brief paper that nicely outlines the many benefits of deploying SAS software to cloud-hosting services like Amazon Web Services. It also touches on some of the disadvantages as well which are important to mitigate in any deployment.

▶ For more information about cloud, check out Erwan Granger’s excellent series of blog posts.

Post a Comment

Can tweets reveal the mood of the #Ebola outbreak?

Less than a year ago, the country’s attention was on Dallas after the first Ebola patient died. Not only is that where this project begins, but also where it was presented at SAS Global Forum in Dallas.

Sharat Dwibhasi and his classmates Dheerj Jami and Shivkanth Lanka from Oklahoma State University analyzed the sentiment of the Ebola outbreak using tweets.

Their research involved extracting the live streaming data from Twitter over a four month period and studying the pattern based on the Ebola timeframe. They used SAS Enterprise Miner and SAS Sentiment Analysis Studio to evaluate the following:

  • How seriously are people taking the outbreak
  • The geographical areas where people are most concerned
  • Percentage of tweets which emphasize awareness

Collecting the Data

The first step was extracting the data from Twitter by accessing the live stream API of Twitter using the tweepy package in Python. They started collecting tweets after the first patient died in the U.S. “We collected tweets from Oct. 8, 2014 to Feb. 15, 2015,” said Dwibhasi. “That was a big part of the project.” The students collected around 270,000 English language tweets and divided them into three datasets which helped them categorize and compare the change in moods.

Next, SAS was used to clean and analyze the data. For the analysis, the students used appropriate NLP techniques, lemmatization, concept linking and use of synonyms.

The Findings

What they discovered is that initially people were worried about catching the disease. Over time, the sentiment changed to caring for those in areas where Ebola was most prevalent. And finally, people started feeling an appreciation for Ebola workers, as well as finding a cure.

They also discovered that positive tweets about Ebola were retweeted more than the negative tweets.

For more details, here’s a link to the paper, Analyzing and visualizing the sentiment of the Ebola outbreak via tweets.

Post a Comment

SAS Global Forum 2015 – a glimpse into upcoming SAS releases

Hadoop, in-memory analytics, the Internet of Things (IoT), machine learning, data visualization— topics that are dominating the analytics airwaves. SAS is innovating in all these areas, rapidly developing new products and functionality to meet the needs of today’s analytic environment.

During this year’s Technology Connection session, SAS Global Forum 2015 attendees got a glimpse into SAS R&D’s 18-month plan, which is driven in large part by initiatives like Hadoop and IoT that are changing the way you can manage and share data.  Here are just a few of the new and enhanced products slated for upcoming SAS releases.

For those wanting to leverage their investment in both SAS and Hadoop investments, look especially for:

The next scheduled release of SAS 9.4 will include:

You can see these products in action by selecting General Sessions from the SAS Global Forum Video Home.

During Chris Hemedinger’s interviews on SAS Tech Talk, SAS product  developers have a chance to share insights the SAS Visual Statistics, SAS Cybersecurity, SAS Studio and other technical directions.

Post a Comment

Converting variable types—use PUT() or INPUT()?

How many times have you had a need to convert between variable types such as converting character to numeric or numeric to character?  For example, what if you have a character variable with numeric values but you need to perform some calculations?  Or, if you have a numeric variable but you need to concatenate it to a character variable?  If you are like most SAS programmers, you need to use PUT() and INPUT() at least once to complete these tasks.

The answer to the question "Do I use PUT() or INPUT()?" depends on what your target variable type is and what your source variable type and data are. Below are three questions to consider:

  1. Is your target variable character or numeric?
  2. Is your source variable character or numeric?
  3. If your source variable is character, is your data value character or numeric?

Based on your answers to the three questions above, you can identify whether PUT() or INPUT() comes first. Keep these four rules in mind when writing your SAS statements:

  • PUT() always creates character variables
  • INPUT() can create character or numeric variables based on the informat
  • The source format must match the source variable type in PUT()
  • The source variable type for INPUT() must always be character variables

The following examples show how to use these rules to convert from character/numeric or  numeric/character:

A  PUT() converts character variable to another character variable.

B  PUT() converts numeric variable to a character variable with numeric value.

C  PUT() converts character variable with a user defined format to another character variable.

D  INPUT() converts character variable with numeric value and informat to a numeric variable.

E  INPUT() converts character variable with numeric value and informat to a character variable.

F  INPUT() converts character variable with numeric value and informat to a numeric variable.

 

 Function Call  Raw Type  Raw Value  Returned Type  Returned Value
A  PUT(name, $10.); char, char format ‘Richard’ char always ‘Richard   ’
B  PUT(age, 4.); num, num format 30 char always ‘  30’
C  PUT(name, $nickname.); char, char format ‘Richard’ char always ‘Rick’
D  INPUT(agechar, 4.); char always ‘30’ num, num informat 30
E  INPUT(agechar, $4.); char always ‘30’ char, char informat ‘  30’
F  INPUT(cost,comma7.); char always ‘100,541’ num, num informat 100541

 

 

Post a Comment

New changes for SAS users group leaders

IMG_1696SAS Global Forum provides a perfect opportunity for SAS users group leaders from across the country to meet in-person to share best practices and new ideas.

This year’s SAS users group leaders link-up event was led by Melissa Perez – the new users group programs manager at SAS. She talked about her new team and how they are dedicated to coming up with better ways to provide resources and support to users group leaders.

Perez also led a discussion with two special leaders - Elizabeth Axelrod, President, Boston Area SAS Users Group and Joseph “Joe” Guido, Chair, Genesee Valley SAS Users Group (Greater Rochester, NY area). These leaders were chosen to speak because they are doing unique things to get users group members engaged in their respective areas.

Axelrod explained that she’s been fortunate enough to find a great meeting space at no cost. She also includes special SAS training at each of her events so attendees can get pick up some new tips and tricks.

Guido takes a social approach to his meetings. “They are not just standard meetings,” said Guido. He has something called “SAS in suds” where members go to a local pub to continue their networking after the users group meetings. They also hold picnics and other socials events throughout the year.

“We’re always trying different things,” said Guido. And that was the biggest take away from both Axelrod and Guido – be unique.

Perez ended the link up by announcing some big news – the creation of a private users group leader’s community group on communities.sas.com. “We want you to be able to connect and get the support from SAS, but we also want you to be able to connect with each other.”

If you’re a 2015 registered SAS users group leader or member of a committee in your area, look out for an email from Perez very soon for your invite into the community. You can also send an email to UGSupport@sas.com to be added to the email list if you are a registered users group leader.

Post a Comment

You react so quickly! Do you have ESP?

SAS Event Stream Processing that is! The latest release of SAS Event Stream Processing will launch May 12, and numerous customers around the globe are already using it. So what’s the big deal?

Why event streams are important to business

SAS Event Stream Processing allows organizations to react to events virtually instantaneously. Consider the following scenarios. Imagine if:

  • An online retailer creates custom offers as customers click around the web
  • An oil company automatically vents a pipeline to a reservoir when sensors detect an increase in pressure
  • A financial regulator reverses predatory trading immediately after it occurred

These are all extremely high value scenarios, and the reason is the rapid reaction time. If the retailer markets to the customer after he or she has already ended their session, it’s less effective. If the oil pipeline breaks, it is a disaster. If the predatory trading isn’t reversed for months, markets and peoples’ lives are negatively affected.

The point is that the value of information decreases dramatically over time. While it’s often helpful to analyze historical data, the faster you can use that analysis to react to actual events, the more valuable it can be to the bottom line. As the graphic below shows, the quicker you take action following a business event, the more that action will be worth in money earned or saved.

ESP_XML_capture.png

Developing event stream models

The recommended path for executing SAS Event Stream Processing models is through the XML Factory Server. To support faster model development, SAS recently added a visual development environment called the SAS Event Stream Processing Studio. Using this graphical interface, model designers can drag and drop windows onto the workspace area to create the appropriate data-centric flow and apply processing rules to address any kind of business need. Simple and intuitive.

SAS Event Stream Processing Studio showing code and associated process flow.

Once the project is designed it can be tested directly within SAS Event Stream Processing Studio, and if everything is fine, the XML model that’s automatically generated from the interface and published to the appropriate server for execution.

SAS Event Stream Studio supports faster analysis and detection of events with:

  • an intuitive environment for developing and testing projects (aka models)
  • a palette of windows and connectors that can be used to design even the most complex event streaming models
  • a definition and testing environment that reduces the need for programming in XML or C++
  • ability to easily instantiate visually-defined models to the XML factory server, connecting to live data streams for model validation
  • full visibility into the automatically generated XML code, which can be further customized with edits and additions

For more information

Post a Comment

Top 5 reasons to attend PharmaSUG 2015

PharmaSUG 2015 logo with sea turtle image and Orlando 2015For more than 25 years, PharmaSUG has been the premier educational experience for SAS users in the pharmaceutical industry. Whether you're new to using SAS or a seasoned veteran, this year's event in Orlando, May 17-20, has something for you!

1. Hear keynote speaker Lilliam Rosario, PhD, of the U.S. Food and Drug Administration (FDA). Dr. Rosario is the director of the Office of Computational Science within the FDA's Center for Drug Evaluation and Research (CDER). She will be speaking on the topic of "Modernizing CDER Drug Review - OCS Technology & Support".

2. Get a discount on registration. Register by May 4 and save $100 on the on-site registration rate.

3. Expand your current skill set. Or get up to speed quickly if you're new to the pharma industry by attending sections that range from Industry Basics to Career Planning.

4.  Get more industry-specific knowledge. Attend any of these great sections:

  • Beyond the Basics
  • Data Standards
  • Data Visualization and Graphics
  • Healthcare Analytics
  • Submission Standards
  • Statistics and Pharmacokinetics

5.  Learn by doing. The conference offers lots of opportunity with free hands-on training sessions. Visual learners especially won't want to miss the popular Posters sessions.

Don't miss this once-a-year opportunity to learn, network and have fun at the annual PharmaSUG conference. See you in Orlando!

Post a Comment

“Think and Do” approach to educating big data students

IMG_1657One of the big topics at SAS Global Forum 2015 is the analytics skills gap. Tonya Etchison Balan of the Poole College of Management at NC State University presented a case study approach for teaching analytical skills.

The motto at NC State is “Think and Do.” What that means is the university wants students to not only learn to think critically, but also to gain hands-on-experience with the tools that will enable them to be successful in their careers.

What does an analytics MBA need to know?

Balan believes an MBA student needs to be able to understand the entire analytics process, but they don’t need to be experts in every aspect.

She highlighted these four areas of the analytics process that students need to know:

  • Data (What data do I need to answer this question?)
  • Insight (Are there any obvious trends or issues with the data?)
  • Decision (How do I interpret the results of the analysis?)
  • Action (What changes need to be made to the business process?)

How do you create an analytics case study course?

When designing a course, Balan said one approach is to break the course up into modules based on a particular analytical method.

She said the toughest challenges is finding data. “It makes it easier to write a case study when you have real data,” explained Balan.

Her suggestions for finding “real” data include:

You also need the right software and methods. Balan uses the following:

Methods

  • Linear regression with real (messy) data
  • Classification Methods
    • Logistic Regression
    • Decision Trees
  • Clustering and Segmentation

Software

  • JMP
  • SAS Enterprise Miner
  • SAS Visual Analytics (and Visual Statistics)
  • SAS University Edition
  • Excel

“The idea of a case study is to give the students some business concepts,” said Balan. “It gives them a sense of what the real business problems is.”

The result is a business leader who can “think and do” -- which goes back to the motto at NC State.

Post a Comment

Can I run SAS Grid Manager in the AWS cloud?

SAS recently performed testing using the Intel Cloud Edition for Lustre* Software - Global Support (HVM) available on AWS marketplace to determine how well a standard workload mix using SAS Grid Manager performs on AWS.  Our testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. You can find the detailed results in the technical paper, SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre.

In addition to the paper, Amazon will be publishing a post on the AWS Big Data Blog that will take a look at the approach to scaling the underlying AWS infrastructure to run SAS Grid Manager to meet the demands of SAS applications with demanding I/O requirements.  We will add the exact URL to the blog as a comment once it is published.

System design overview – network, instance sizes, topology, performance

For our testing, we set up the following AWS infrastructure to support the compute and IO needs for these two components of the system:

  • the SAS workload that was submitted using SAS Grid Manager
  • the underlying Lustre file system required to meet the clustered file system requirement of SAS Grid Manager.

SAS Grid Manager and Lustre shared file configuration on AWS clour

The SAS Grid nodes in the cluster are i2.8xlarge instances.  The 8xlarge instance size provides proportionally the best network performance to shared storage of any instance size, assuming minimal EBS traffic.  The i2 instance also provides high performance local storage, which is covered in more detail in the following section.

The use of an 8xlarge size for the Lustre cluster is less impactful since there is significant traffic to both EBS and the file system clients, although an 8xlarge is still is more optimal.  The Lustre file system has a caching strategy, and you will see higher throughput to clients in the case of frequent cache hits which effectively reduces the network traffic to EBS.

Steps to maximize storage I/O performance

The shared storage for SAS applications needs to be high speed temporary storage.  Typically temporary storage has the most demanding load.  The high I/O instance family, I2, and the recently released dense storage instance, D2, provide high aggregate throughput to ephemeral (local) storage.  For the SAS workload tested, the i2.8xlarge has 6.4 TB of local SSD storage, while the D2 has 48 TB of HDD.

Throughput testing and results

We wanted to achieve a throughput of least 100 MB/sec/core to temporary storage, and 50-75 MB/sec/core to shared storage.  The i2.8xlarge has 16 cores (32 virtual CPUs, each virtual CPU is a hyperthread on a core, and a core has two hyperthreads).  Testing done with lower level testing tools (fio and a SAS tool, iotest.sh)  showed a throughput of about 3 GB/sec to ephemeral (temporary) storage and about 1.5 GB/sec to shared storage.  The shared storage performance does not take into account file system caching, which Lustre does well.

This testing demonstrates that with the right design choices you can run demanding compute and I/O applications on AWS. For full details of the testing configuration and results, please see the SAS® Grid Manager 9.4 Testing on AWS using Intel® Lustre technical white paper.

 

Post a Comment

The future of analytics

Before kicking off SAS Global Forum in Dallas, SAS held an academic summit to recognize faculty and students who are making a difference in analytics. They also shined a light on 15 student ambassadors whose papers were selected for their innovative approach to solving problems using SAS.

Inside SAS Global Forum host, Anna Brown, interviewed two of the 2015 ambassador winners.


 
Filling the analytics gap

SAS CEO Dr. Jim Goodnight spoke at the summit about the importance of developing students with deep analytical skills to fill the shortage of talent in the industry. “Today you can be confident you made a good decision to study analytics,” said Goodnight. “The field is exploding, and the demand for talent is growing faster than the number of students entering the field.”

One year ago, here at SAS Global Forum, Goodnight unveiled SAS Analytics U. The program offers free software to university students, faculty and researchers. Since its launch, SAS University Edition has been downloaded more than 250,000 times. To make it even easier to learn SAS, you can now access the University Edition on Amazon's AWS Marketplace.

Dr. Jim Goodnight

Dr. Jim Goodnight

Announcing the Analytics Symposium

The future analytics leaders will have even more opportunities to showcase their skills at SAS Global Forum 2016.

Ken Koonce, Professor of Experimental Statistics at Louisiana State University, announced the first-ever Analytics Symposium. He explained that it will be a competition where teams of students have an opportunity to solve a real-world analytics problem using public data. “This will be an opportunity for you to get to know what others are doing and sell yourself for a potential job,” said Koonce.

SAS will be providing the teams with a special version of Analytics U to use in their research. The top eight teams will be selected to attend the conference in Las Vegas next year to compete.

More information on the Analytics Symposium is coming in July. In the meantime, get your team together!

2015 Student Ambassadors

2015 Student Ambassadors

Post a Comment