Reading Hierarchical Data - Part 3

This post is the third and final in a series that illustrates three different solutions to "flattening" hierarchical data.  Don't forget to catch up with Part 1 and Part 2.

Solution 2, from my previous post, created one observation per header record, with detail data in a wide format, like this:

Detail Approach: One observation per header record
 
Obs    Family     Employee    Spouse    Child1    Child2    Child3
 
1     Jones       Bob        Carol     Sally     Alice
 
2     Sanchez     Mary
 
3     Smith       Nancy      Harold

Today's Solution 3, unlike Solution 2, has no arbitrary limit to the number of detail items, because it stores the detail data in a tall, rather than wide, format, as shown below, with one observation per detail record, rather than one observation per header record.  Read More »

Post a Comment

The one piece of advice everyone in analytics needs to hear

conversationI was recently asked why I would recommend my new class, Explaining Analytics to Decision Makers:  Insights to Action.  The answer goes back to some great advice, a lunch of eggplant parmesan and in another more twisted way, to what was ironically affectionately known as the “bomb plant.”

Early in my career I was working for a large company on a project.  On this project was Bill, a well-respected, seasoned professional.  It was known Bill was a year or two from retirement.  Far from waiting out retirement, this gentleman was floating from project to project with little pressing responsibility and offering advice where he could.  I knew of his reputation as a respected engineer and was pleased that through a good friend I had gotten to know him.  Bill had heard that I was considering leaving my current position and starting an independent consulting practice to provide analytical support to a variety of clients.  (This was well before today’s market and was more of a stretch than such a plan would be today.)  He suggested we meet for lunch to discuss my plans.

Adapting to changes

Over lunch as we discussed my plans, he reflected on his long career.  After growing up in east Georgia, he found himself misplaced both psychologically and professionally by the development of the Savannah River Plant.  Affectionately known by the locals as the “bomb plant”, the building of the plant changed the region.  Following a theme of all great southern writers, it was the backdrop of a changing South and the undercurrent of life captured by Pat Conroy in The Prince of Tides.

But Bill’s story was direct.  Life as he knew it, based on a tight circle of life long contacts, had changed.  He had found himself in the big city and had to live his off engineering skills.  He talked about how his life changed and how he had to change.  I listened intently as he was easy to listen to.  He imparted his expansive knowledge with an easy manner.  I was not aware he was leading up to a life changing moment for me.

Selling yourself is key

Read More »

Post a Comment

Reading hierarchical data - Part 2

This post is the second in a series that illustrates three different solutions to "flattening" hierarchical data.

Solution 1, from my previous post, created one observation per header record, summarizing the detail data with a COUNT variable, like this:

Summary Approach: One observation per header record
 
Obs    Family     Count
 
 1     Jones        4
 2     Sanchez      1
 3     Smith        2
                  =====
                    7

Solution 2, illustrated in today's blog, creates one observation per header record, like Solution 1, but replaces the COUNT variable with detail data, in a wide format, like this:

Detail Approach: One observation per header record
 
Obs    Family     Employee    Spouse    Child1    Child2    Child3
 
 1     Jones       Bob        Carol     Sally     Alice
 2     Sanchez     Mary
 3     Smith       Nancy      Harold

Read More »

Post a Comment

Reading hierarchical data - Part 1

FamilyA family and its members represent a simple hierarchy.  For example, the Jones family has four members:

Family_JS

A text file might represent this hierarchy with family records followed by family members' records, like this:

Family2_JS

 

The PROC FORMAT step below defines the codes in Column 1:

proc format; 
   value $type
	 'F'='Family'
	 'E'='Employee'
	 'S'='Spouse'
	 'C'='Child';
run;

Read More »

Post a Comment

Flexibility of SAS Enterprise Miner

analyticsClassDo you use an array of tools to perform predictive analytics on your data? Is your current tool not flexible enough to accommodate some of your requirements? SAS Enterprise Miner may be your solution.

With growing number of data mining applications, having a tool which can do variety of analysis is just not enough. Some situations require an open extensible design that provides ultimate flexibility and personalization so that users can tailor their experience according to their needs.

The flexible architecture of SAS Enterprise Miner opens an entire world of SAS to data miners and data scientists with a variety of skill levels, ranging from business users to technical experts.

What can users achieve by SAS Enterprise Miner’s flexible architecture?

When there are situations where you want to customize the functionality based on the business requirements, the SAS code node and extension node come to your rescue. SAS Enterprise Miner’s SAS code node enables users to incorporate new or existing SAS code into the process flow. SAS code node extends the functionality of SAS Enterprise Miner by making SAS procedures available in data mining analysis. One can also create custom extension nodes using SAS code and XML logic and share it with others across the enterprise. Diagrams can be shared easily with other analysts throughout the enterprise. Read More »

Post a Comment

How is electricity generated in your state?

I recently saw an article on washingtonpost.com showing what methods are used to generate electricity in each state. The data was interesting enough that I decided to try my hand at graphing and mapping it with our SAS software. Read along to see what I kept the same, and what I changed...

But before we get started, here's a fun picture of my friend "Magic Wanda" at my Halloween party. I'm sure it was just an oversight that the article did not include witchcraft & sorcery as methods used to generate electricity! ;)

margie_electric

 

And now, on to the graphs!...

Here's a screen capture of the main graph in the washingtonpost.com (wp) article. It's a pretty cool interactive graph, and when you click on the colored bar segments or the legend, it brings the selected electricity source to the top and sorts the bars by the selected source.
Read More »

Post a Comment

What areas do venture capitalists invest in medical research?

The Wall Street Journal recently published a study of the top 17 medical areas (or body parts) that venture capitalist investments are likely to benefit. They used graphs to summarize the results, but "the graph guy" in me just couldn't resist trying to improve them. Did my improvements help? - You be the judge!

Before we get started in the data analysis, I want to take a minute to point out how fortunate we are to live at a time when medical technology is so advanced. For example, the first successful long-term lung transplants took place in the 1980s ... and today we have a member on our dragonboat racing team who has had both lungs replaced. Can you tell which team member it is?

lung_transplant

And now, on with the graph makeover! ... Read More »

Post a Comment

Hadoop releases - here's the timeline graph!

There's a lot of buzz about Hadoop these days. I started checking into it, and there seemed to be a gazillion releases. So, being The Graph Guy, I decided to create a graph to make it a little easier to digest!

During my search for Hadoop information, I found the Apache page showing all the releases. As I scrolled down through page after page of releases, I found it difficult to get a grasp on things - there seemed to be multiple versions releasing simultaneously.

I didn't want to have to work very hard to understand Hadoop releases - I just wanted an "Easy Button." And when your favorite tool is SAS, your easy button often looks a lot like a custom graph. :)

I examined the html code behind the Hadoop release page, and found that all the releases had a consistent 'header' line that I could search out and parse programmatically. Here's an example:

hadoop_html

Read More »

Post a Comment

The world's most valuable sports teams

There's big money in professional sports these days - we're talking billions of dollars! Do you know which teams are the most valuable? The graphs in this blog will show you...

I recently saw a bar chart on dadaviz.com showing the world's most valuable sports teams. It was the right kind of graph for this type of comparison, and it showed interesting data ... but their use of color really didn't work for me. Here's a screen-capture of their graph. Try to pick a color in the legend (such as Football or Formula1) and quickly identify all those colored bars in the graph - I bet you can't!

world_sport_team_values_2015_dadaviz

 

So I found the data source (forbes.com), entered the data into a SAS dataset, and created my own version of the graph. I kept the layout the same as the original ... but instead of showing all the colors together, I created a separate graph for each sport. Read More »

Post a Comment

Getting SAS certified one credential at a time

Krystian Matusz is what I’d call a super SAS user. He currently holds seven out of the nine credentials SAS offers.

  • SAS Certified Advanced Programmer for SAS 9
  • SAS Certified Base Programmer for SAS 9
  • SAS Certified BI Content Developer for SAS 9
  • SAS Certified Clinical Trials Programmer Using SAS 9
  • SAS Certified Data Integration Developer for SAS 9
  • SAS Certified Platform Administrator for SAS 9
  • SAS Certified Visual Business Analyst: Exploration and Design Using SAS Visual Analytics
Krystian Matusz, 2015 Junior Professional Award Winner

Krystian Matusz, 2015 Junior Professional Award Winner

That makes him part of an elite group of only about a dozen people who hold that many credentials.

Since we love numbers at SAS, I decided to look into whether anyone has earned all the credentials. I found out that only three people have earned eight credentials. And no one has earned all of them. At least not yet.

Krystian is currently preparing to take the SAS Certified Statistical Business Analyst exam this fall. He’s also a beta tester for our latest credentials, SAS Certified Data Quality Steward.

So what drives someone to go after all the credentials? And how much do you need to prepare for each exam? Those are just a few of the questions I asked Krystian – the super SAS user.

1.  Why did you decide to earn so many SAS certifications?

I am a self-motivated and ambitious person. I am always willing to learn and love to challenge myself. I focus on final results and customer's point of view to understand what the customer wants and what his needs really are. Thanks to the certificates, I can better advise them and generate the highest business value as possible for them.

These achievements prove my skills, expertise and knowledge. They give me an opportunity to extend my knowledge and to deliver the best advice for architects and technical/business board teams (e.g. CTO and software architects). The knowledge from my education and exams allows me to deliver the best implementation: pure code, optimal solutions and great working software with clear results: reports, graphs, documents, KPI, KGI and hints for stakeholders. Also, SAS software is one of my favorites. I admire it for its potential. Read More »

Post a Comment