Is your LASR implementation running short on memory? Since LASR tables are stored in memory, it can become scarce. So what can we do to minimize LASR table size and still get LASR’s legendary performance? Here are a few strategies on how to shrink LASR tables:
Compression: When compression was first introduced, many people thought it would be the best and only solution for reducing LASR table sizes. However it is actually just one of many approaches. While LASR compression can reduce tables sizes dramatically (94% in a test I did on the MegaCorp sample table), it can affect performance. Compression is easily implemented, so it should be the first strategy you try.
SAS 9.4 M3, introduces a new procedure named PROC SQOOP. This procedure enables users to access an Apache Sqoop utility from a SAS session to transfer data between a database and HDFS. Using SAS PROC SQOOP lets you submit Sqoop commands from within your SAS application to your Hadoop cluster.
PROC SQOOP is licensed with SAS/ACCESS® Interface for Hadoop, it’s not part of the Base SAS® license. PROC SQOOP is supported in UNIX and Windows SAS.
Sqoop commands are passed to the cluster using the Apache Oozie Workflow Scheduler for Hadoop. PROC SQOOP defines an Oozie workflow for your Sqoop task, which is then submitted to an Oozie server using a RESTFUL API.
PROC SQOOP works similarly to the Apache Sqoop command-line interface (CLI), using the same syntax. The procedure provides feedback as to whether the job completed successfully and where to get more details in your Hadoop cluster.
As a longtime fan of the New York Mets and a longtime employee of SAS, I'm particularly excited about the start of the World Series. In 2014, the Mets and SAS formed a partnership to use SAS analytics to help the organization build stronger relationships with its fan base. While the vastmajority of the credit for the Mets' success belongs with the players and coaches, I think that makes SAS a small part of the organization’s World Series run as well!
There's no denying that analytics has become an integral part of baseball. Sabermetrics, the empirical analysis of baseball, started the craze. A number of books, and movies like Moneyball, made baseball analytics mainstream.
SAS Grid Manager for Hadoop is a brand new product released with SAS 9.4M3 this summer. It gives you the ability to co-locate your SAS Grid jobs on your Hadoop data nodes to let you further leverage your investment in your Hadoop infrastructure. This is possible because SAS Grid Manager for Hadoop is integrated with the native components, specifically YARN and Oozie, of your Hadoop ecosystem. Let's review the architecture of this new offering.
First of all, the official name– SAS Grid Manager for Hadoop– shows that it is a brand new product, not just an addition or a different configuration of the “classic” SAS Grid Manager – which I will subsequently refer to as “for Platform” to distinguish the two.
For an end user, grid usage and functionality remains the same, but an architect will notice that many components of the offering have changed. Describing these components will be the focus of the remainder of this post.
Chevell Parker (left) with SAS user Ron Fehd at MWSUG2015
To kick off his presentation at MWSUG2015, SAS’ Chevell Parker flashed a picture of an old-school phone booth and asked the audience where he could find the nearest one. Met with several seconds of silence, he smiled. “Don’t all answer at once!” The point of his question was obvious: as technology advances, you need to change to stay relevant. Parker argues that SAS users face a similar challenge. “Reports are critical for businesses,” said Parker. “In today’s business world, the days of traditional, single-listing reports have gone the way of the phone booth.”
Fortunately Parker, a Sr. Principal Technical Support Analyst for SAS, says SAS users are in great position to meet this challenge. “These days you need to deliver and report information in a way that most benefits your customers. SAS’ Output Delivery System has tools that enable you to package, present, and deliver report data in meaningful ways, across the most popular desktop and mobile devices.”
Prior to the 7.2 release of Visual Analytics, the home page application properties were accessed by expanding SAS Application Infrastructure–>SAS Visual Analytics, under the Configuration Manager Plug-in of SAS Management Console. With Visual Analytics 7.2, the home page properties (SAS Visual Analytics Hub) are located just below SAS Application Infrastructure, instead of being located under SAS Visual Analytics.
Citing figures from the Bureau of Labor Statistics and a recent CNN Money report, Lafler highlighted the value of SAS knowledge and noted a rising demand for individuals with SAS proficiency. According to the sources, SAS programmers can expect an average salary in excess of $90,000 a year with a projected growth rate that tops 22%.
“When I got out of college I was making what is now less than minimum wage and I was glad to get that offer,” Lafler said. “Now, individuals with analytical skills are getting internships for salaries that make me drool.”
Big data offers even bigger opportunities. With all the buzz around data science and demand for individuals with analytical skills outpacing supply, Lafler said big data is one of the “hot” skills for today’s SAS professionals. But it’s far from the only growth area.
A coworker was recently in need of some simple graphics to include in a slide show to accompany her SAS Global Forum paper. After listening to what she wanted, I decided that I could use PROC SGPLOT to create those images for her.
The first image was a set of stacked blocks displaying the letters A, B, and C. Since blocks are drawn with only four coordinates, we can draw those using the POLYGON statement. We can use the MARKERCHAR option in the SCATTER statement to draw letters within each block.
The POLYGON statement was added to PROC SGPLOT in the first maintenance release of SAS 9.4 (TS1M1) and enables you to define the X and Y coordinates to draw a polygon. The ID required argument in the POLYGON statement identifies each set of X, Y coordinates for a particular polygon.
In a DATA step, we defined the coordinates for each of the blocks, the center of the block, and the letter to use for the marker character:
inputx y letter $ xcen ycen;
1010 B 15151020 B 15152020 B 15152010 B 15151620.5 A 2125.52620.5 A 2125.52630.5 A 2125.51630.5 A 2125.52210 C 27153210 C 27153220 C 27152220 C 2715
Creating a grocery shopping list can be overwhelming for a variety of reasons. Lack of experience, picky eaters, and new recipes can turn an ambitious cook into an overwhelmed procrastinator. In this blog post I’ll show you an easy SAS Enterprise Guide project that uses prompts to create a simple user-interface for execution. The project allows you to select a subset of recipes from a long list in Excel, indicate the number of servings desired, and output a printable grocery list to take to the store. The key takeaways will be to demonstrate SAS Enterprise Guide prompts and provide a practical use for SAS in everyday life.
View this video for more details on the prompts and how the project was created.
Sizing is a topic that solutions managers typically leave until the end after decisions about the application have been settled. But there are often many variables that can impact the final size requirement. We have seen across our customer base that sizing and the number of environments has been determined by predicted data volumes, the types of environments that need to be supported and the budget available.
Technical architects spend time debating what environment is right for their business and of course this is no easy decision. Often the business changes its mind, data volumes increase (often with little or no advance warning), data sources vary, different teams need access and with this performance issues creep in.
Production – this one is a must so is easy to say yes to. It’s perhaps the easiest of the estimates as long as the solutions team is able to predict the volume of data.
Undersizing is a common problem here for many reasons. The most common reason is when the solution has been far more successful and has attracted more users, and/or data sources. The second common cause of this is where the procurement team has persuaded the solutions managers that they can make do with less resources. Finally we also sometimes see incorrect assumptions being used when sizing.
All code examples are provided as is, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.