Move beyond the 'whys' of CDISC and bridge the gap between theory and practice

beyond-the-whys-of-cdiscI am writing this post with the satisfaction of having Implementing CDISC Using SAS: An End-to-End Guide, Second Edition completed and on the shelf. Once again, it was my pleasure to collaborate with Chris Holland, and I am glad that we had a chance to update the book with current software and standards. That leaves me thinking about where we are right now in terms of clinical trials standards and compliance, and I am a bit concerned.

Fifteen years ago, we had the birth of what we know as the CDISC SDTM, and ADaM was in its infancy. Now here we are, and the SDTM, ADaM, and DEFINE-XML submission data standards have matured and grown. Oh, how they have grown. When I teach ADaM, I am often amazed that students haven’t read all of the model documentation, but should I be? If you take the current basic ADaM documentation, which includes the model, implementation guide, time to event, OCCDS and examples document you are looking at reading 305 pages. Define-XML is a sprightly read at 98 pages, and the current SDTM documentation clocks in at a meaty 469 pages. For those keeping track at home, that is 872 pages of FDA submission data standards, and that doesn’t even include the new therapeutic area user guides (TAUGs). Include those and you’re likely at a nice round 1,000 pages of submission standards to understand.

Those 1,000 pages of clinical data standards continues to grow, and so does their interpretation. Define-xml is fairly rigid, with the SDTM a bit less so, and ADaM even less rigid. This lack of rigidity leads to multiple interpretations of the standards across implementations in the industry. To add to that complication, we now have various regulatory requirements on the CDISC requirements. The FDA has a technical conformance guide that adds additional requirements to the CDISC standard requirements, and now they have a new technical rejection criteria document as well. There is also the PMDA CDISC submission requirements in Japan, as well as the additional checks found within the Pinnacle 21 validation tool. Add it all up and you have a lot more to understand beyond the1,000 pages of base standards.

beyond the 'whys' of CDISC

As the standards teams continue to evolve the CDISC standards, and as we get evolving interpretations and additional requirements layered over the base standards, things are getting a tad complex. At times, it has me wondering if this increasing standards complexity is the way to go. It is worth noting that CDISC isn’t even the only clinical data standards game in town. As clinical research evolves to be based more on hospital electronic health records (EHRs), we can expect models such as HL7 v3 or FHIR to play a greater role. How this all works out with CDISC is yet to be seen as the CDISC and HL7 worlds are eventually truly bridged, and not just BRIDG’d.

We need an easy button for clinical data submissions. Have we created one yet?

Post a Comment

SAS Certification for the Non-Practitioner

cert2013_89x89Let me make this clear right from the start: I needed to answer 44 of the 62 exam questions correctly to earn my BASE SAS Programmer certification, and I got exactly 44 right.  No margin of error.  I never thought I’d be proud of a 70%, but I’m more than happy to make an exception in this case.  Woot, Woot!!

This marks my second SAS certification in as many months, the first being Statistical Business Analyst – Regression and Modeling.  Which brings me to another point on which I must be clear:  I am neither a programmer nor a statistician, just a finance guy in a marketing role, developing an internal corporate thought leadership program.

Before I answer the obvious “How did you do it?” question, some brief context would be in order.  While my finance background, education and experience, along with eight years in marketing at SAS, has undeniably exposed me to a considerable measure of statistics and programming, it has only been since I started my most recent educational endeavor a year ago that the idea of getting SAS certified ever entered my mind.

Certification was the latest step in my never-ending quest to remain relevant and add value in a rapidly changing employment environment. (It seems like downsizing has become a standard business practice; I haven’t voluntarily left an employer in over 30 years, but SAS is employer #4!).  I am also pursuing an online master’s degree in Analytics, and I’m happy to report that SAS has been an integral component of my first two statistics courses.

How did a non-practitioner like me get SAS certified twice?

Daily, hands-on practice would of course be the best option, but not working in software development or as a programmer, I had to take a different approach.  My in-class homework projects using SAS were of course extremely beneficial, but not sufficient in themselves – simply not complex enough to meet the requirements for certification.

The place to start, at least for me, was with the SAS self-paced e-Learning courses: Programming 1 and Programming 2 for the BASE SAS certification, and Statistics 1 and Statistics 2 (ANOVA and Regression) for the Statistical Business Analyst certification.   All I can say at this point, after eight years at SAS with these resources so easily available to me:  Why didn’t I start sooner?

At my age I understand my own learning style pretty well:  1) Auditory, and 2) I need to understand the overall framework first so that later I have a home for the details.  The e-Learning courses fit both of those requirements to a “T”. They allowed me to go back and re-listen to each module as many times as I like;  I simply went through each module or section casually the first time, trying to get the overall gist of the concept, before returning a second, third and fourth time to take the process more seriously, making written notes for future reference and study.

Here are some pointers to get you started.

Advice #1: After each section, print off both the section SUMMARY and your QUIZ result, and save them in a folder just for this purpose.  Also, print off any INFO material that occasionally pops up – most times it will save you from the painstaking task of writing down details that SAS has already summarized and prepared for you.  I kept a small spiral-bound notebook for each class, and as I already mentioned, took notes as I went.

Read More »

Post a Comment

Where is this Dakota Access Pipeline they're protesting?

You've probably seen dramatic pictures in the news lately about the protests against the Dakota Access Pipeline - Native Americans versus bulldozers, protesters and tipis in a blizzard, and things like that. But do you really know much about this pipeline? For example, where will it go, and where have the protests been happening?

I don't have any dramatic pictures of the protests, but how about this ... I asked if any of my friends had a picture from North Dakota, and Sunil (who ran at least 1 marathon every month last year) happened to have a photo from a marathon he ran in Fargo, ND. Pretty cool, eh!?!

fargo_marathon

And now, let's get to work! I recently saw a map on the nytimes website that showed a lot of very detailed information about the pipeline: the path, the river-crossings, land taken through eminent domain, protest locations, etc. They provided a very long/skinny map, and let you scroll straight down the page to navigate the length.

Here's a sample of a couple hundred miles of their map (not nearly the whole map):

dapl

Their map was a very nice, interesting, and informative, but my brain had a difficult time getting oriented because  it was zoomed-in past the reference points that I might recognize (such as state borders), and 'north' was towards the top/right corner rather than being in its usual position at the top of the page.

I thought their map would be more effective and easier to understand, if they first showed the pipeline on larger map that people could easily recognize. I just so happened to have a pipeline map of the 48 contiguous US states that I had used in a previous blog post (about a pipeline break along the east coast), so I decided to use that map as a starting point, and add the Dakota Access Pipeline (DAPL) to it.

I didn't have the coordinates for the DAPL, so I studied the nytimes map in great detail and estimated lat/long coordinates along the pipeline based on landmarks like cities, roads, and lakes in the nytimes map. I then annotated the DAPL as a bright red line on my pipelines map, and also annotated a bright green circle/pie showing the location of the main protest site (Sacred Stone Camp).

dakota_access_pipeline

With this map, you can easily see that the pipeline is quite long - extending from northwest North Dakota (where they've recently found large new oil fields in the Brakken/Three Forks formation) to Patoka, Illinois (an oil tank farm that connects many pipelines).

If you first look at my map to get the "big picture" and get oriented before you look at the nytimes map, I think you will have a much better understanding.

 

Post a Comment

Is the Moon really 1/4 the size of the Earth?

Are you ready for the supermoon on December 14? This will actually be the 3rd supermoon in 2016! With these big-looking full moons we're having this year, I got to wondering exactly how big is the moon compared to Earth? This seems like a good question to answer with some SAS graphs!...

But before we get started, here's a picture my friend David took of the a supermoon this year. The weather wasn't great, but he managed to get a pretty decent shot!

david_supermoon

As with most things I'm curious about, I started my quest for knowledge with a Google search. I asked Google "How big is the moon compared to Earth?" Google found about 1.8 million results, and showed me the following summary at the top:

google_moon_earth

I have often heard that the moon is about 1/4 (25%) the size of the Earth, but I was a little distrusting of that number. For example, look at the picture in the Google summary - the moon doesn't look 25% the size of Earth, does it? As with many things, it depends...

The 25% number comes from comparing the diameters. The moon's diameter is indeed about 25% the size of Earth's diameter, which is easily shown in a simple bar chart:

earth_and_moon_diameter

But if you look at the earth and moon side-by-side (as in the image in the Google results above), does your brain compare the diameter, or does it compare the amount of area taken up in the image? Here's a visualization of the cross section area of the moon and Earth laid out on a gridded graph, so it is easy to visually compare them (each square area in the grid represents 1,000 x 1,000 = 1 million square miles).

earth_and_moon_graph

The above graph is a nice visual comparison that people can relate to (since the circular shapes correspond to the moon and Earth), but let's make it even easier to compare the areas, by plotting the them in a bar chart. Here, you can see that the cross-sectional area of the moon is much less than 25% the size of the Earth.

earth_and_moon_area

And what about the volume of the spheres? As you've probably guessed by now, that's an even bigger difference!

earth_and_moon_volume

So, when somebody asks how big the moon is compared to Earth, the answer is "It depends!"

If you're a data analyst, you might be thinking ... "Hmm, I wonder if this same issue might affect bubble chart visual perception? - Should I be sizing the bubbles based on the diameter, the area, or the volume of the bubbles?" (and yes, this is an important issue that many people creating bubble charts don't even think about). Feel free to leave a comment with your thoughts/preferences on bubble chart sizing!

 

Post a Comment

Using the SAS Macro Language to Create Portable Programs

As technology expands, we have a similarly increasing need to create programs that can be handed off – to clients, to regulatory agencies, to parent companies, or to other projects – and handed off with little or no modification needed by the recipient. Minimizing modification by the recipient often requires the program itself to self-modify. To some extent the program must be aware of its own operating environment and what it needs to do to adapt to it.

When you employ portable programming techniques, you are practicing defensive programming. When your program is delivered and before it can be put into production, other programmers may need to ‘fine tune’ it. Robust code requires fewer modifications, therefore introduces fewer problems.  But how do you write programs that are portable? How do you construct programs that are portable enough that they can run in a variety of situations with minimal (or better yet, without any) programmer intervention? It’s possible, and here are a few examples of effective portable programs.

Retrieving date values

The title in a sales report should reflect the current execution date. If date is hardcoded, the title will require a change each time the report is executed. To make this program portable, we need the ability to determine and insert the current date automatically.

title1 "Shoe Sales Report for August 20, 2016";

The user defined macro function %CURRDATE returns today’s date in WORDDATE form. The SAS date is retrieved using the DATE function, which is then formatted, trimmed and left justified.

%macro currdate;
%qtrim(%qleft(%qsysfunc(date(),worddate18.)))
%mend currdate;

When used in the TITLE statement, this macro function assures that the date value displayed by the title will always be current, and it does it without user intervention.

title1 "Shoe Sales Report for %currdate";

Determining the operating System

When programs are moved from one platform to another they need to be adapted to the new operating environment. The first step in this process is determining the current operating system. The easiest way to do this is to take advantage of the automatic macro variables &SYSSCP and &SYSSCPL. The %PUT statement shown in this SAS Log was executed under the Windows (7 Pro) operating system for SAS 9.4.

Read More »

Post a Comment

Why SAS maps are better than Wikipedia svg maps!

Parents are always proud of their kids, and think their kids are better than all the other kids. I guess it might be a little bit that way with mapping software ... but I really do think I've got a pretty compelling example to show that SAS maps are better than Wikipedia's svg-derived maps.

Let me start by saying I really like Wikipedia - both the idea, and the actual entity. As far as "The Internet" goes, Wikipedia is probably my #2 most favorite useful thing (2nd only to Google ... which says a lot!).

Wikipedia has a page for just about every topic. And, being an election year, they have a page for the 2016 Presidential Election. And on that page they have a Maps section, which contains several different maps of the election results. You can click on the map thumbnails, and see a larger version of the map (the note under the map indicates it was "derived from USA Counties.svg" - it appears what they include in Wikipedia is just a png snapshot, not an svg map).

I decided to look at their red/blue county map, where each county was colored red or blue, depending on whether Trump or Hillary got the most votes. A topic of much discussion lately - Trump won much more land area than Hillary (which all the red in this map shows very well), whereas Hillary won more of the big cities with large populations. I was curious what some of the counties in the map were, but the Wikipedia map has no labels or mouse-over text, so I couldn't really determine the county names, nor how many votes each candidate got.

Next, I looked at their gradient shaded county map. This one showed not only which candidate won, but how strong was their win. But I noticed one county in California was blue in this map (indicating Hillary got more votes), whereas it was red in the other county map (indicating Trump got more votes). Unfortunately, with no mouse-over text in the Wikipedia maps, I didn't know what the county name was, and it was therefore difficult to research this discrepancy. I assumed that maybe the maps used data from different dates ... but since they didn't label the maps with the source & date for the data they used, it was difficult to be certain. Below are screen-captures of the California section of the Wikipedia maps, with the county in question circled in green:

red_bluegradient

 

 

 

 

 

 

 

 

I was a bit frustrated by the lack of "analytic tools" in the Wikipedia maps, therefore I decided to create my own SAS map. With a little help from my co-worker (John), I found a csv file containing the election county results on github. I wrote some SAS code to read the data directly from that page, and plot it on a map. I added mouse-over text that shows the county name and number of votes for Trump & Hillary, and I added a footnote at the bottom showing the source & date of the data snapshot (this is important, because the exact vote counts change as the provisional ballots are resolved, etc). You can also click the footnote to go to the github page. Here's my SAS map - click the image below to see the interactive version with the mouse-over county text and footnote drilldown:

us_election_by_county

The county in question is blue in my SAS map, and you can hover your mouse over it and easily see that it is Stanislaus county. I did a Google search on Stanislaus county, and found a page with their election results, and confirmed that it does look like Hillary has more votes in that county. It appears that the Wikipedia red/blue map is either wrong, or out-of-date.

I did a bit more research on the Wikipedia maps, and looked into the push-history, and found that the red/blue map was pushed November 10 (which was only a few days after the election, and those early results likely changed since then). The gradient shaded map's history says it was pushed on November 24 (so I assume it has more up-to-date data than the other map). It would have been useful to include a date label on the Wikipedia maps, and I would recommend that all the maps shown together be generated using the same data.

So, anyway ... SAS allows you to create a png map with an html overlay, so you can have mouse-over text for each area in the map. And that provides a huge advantage for people who want to analyze the map a bit deeper (or research a value that appears curious/wrong/interesting). SAS also provides several ways of adding text to maps (titles, footnotes, notes, or annotated text anywhere you want), which can be useful in identifying exactly what data was used.

Have you found any incorrect (or confusing) election maps? - Feel free to share in the comments!

 

Post a Comment

5 Reasons to write your first SAS Press book

Editor's note: This series of blogs addresses the questions we are most frequently asked at SAS Press!

Ever thought about writing your own SAS or JMP book? Here are a few reasons why writing a SAS Press book can be a fantastic career move!

1.      Your book establishes you as an expert and enhances your credibility

Being recognized as a subject-matter expert or thought leader is beneficial if you’re looking to gain more media attention for yourself. If you see yourself as more of a thought leader than a programmer, SAS Press has partnered with Wiley for the Wiley and SAS Business Series to publish thought leadership titles and high-level business concept books.

write-a-book-with-sas01

Mike Gilliland and Udo Sglavo with their Wiley Business Series title, Business Forecasting: Practical Problems and Solutions

2.      Your book helps promote and nurture your digital self

Consider a book as a launch pad for a whole host of social media and web credibility opportunities. We have social media specialists to drive the content, and with it your digital persona, helping nurture and develop your professional digital self––keeping it fresh and in the public eye.

3.      Your book can be your business card.

While you’re working at the office, your book can be out working as well, helping you grow your audience, teaching people, and building your reputation as an authority in your field. Use your book to start or further conversations with new or existing SAS customers, or like-minded professionals. At meetings and conferences, you can use your book as your business card! Your book can lead to more career opportunities.A book can really enhance a career by generating new opportunities and enhancing existing ones. Whether it’s speaking engagements, teaching classes, or webinars, everyone wants the person who “wrote the book” to come speak to them firsthand!

4.      Your book royalties provide additional income

You’ll always make far more in opportunities that a book can bring than in the book itself (unless you have a national bestseller), but you can still count on some income from the sales of the book. Via our SAS bookstore, national resellers, and global booksellers, your book will reach a worldwide audience in print and e-book formats!

5.      Your book can help other SAS and JMP users

The biggest reason to write a book is that SAS books enhance the skills and careers of SAS users. Sharing your knowledge helps them grow!

write-a-book-with-sas02

In the classroom with Jane Eslinger’s The SAS® Programmer’s PROC REPORT Handbook

The first step in writing a book is deciding to take that first leap of faith! Watch this short video for helpful advice from first-time SAS Press authors to anyone thinking about writing their first book.

Learn more about how to submit a book proposal.

Too Busy to Write? Review Instead!

If you have technical and teaching abilities but are too busy to write a book, we are always looking for qualified technical reviewers to help with our book development process! Reviewers receive a copy of the book when it’s published, book credit to be used in the SAS Store, and much gratitude from SAS and SAS Press authors for your help! Learn more about how to review one of our books.

 

Post a Comment

My top 10 graph blog posts of 2016!

When I was a kid, I always looked forward to Casey Kasem's American Top 40 song countdown at the end of the year. Did I listen to check whether my favorite songs had made the list, or to critique how well the people making the list had done in picking the 'right' songs? Of course, it was probably a little of both, hahaha!

This year I've written over 80 blog posts, and I decided to make a list of the 10 I think are the best! These aren't necessarily the ones that had the most views, but rather the ones I think had really good/innovative/clever graphs of interesting data. I really enjoyed writing these 10, and I hope you enjoyed (or will enjoy) reading them...

I like to include a random somewhat-related photo with my blogs, and this time I asked my friends if they had a picture of a ribbon/medal/trophy they had won. I found out that my friends are really good at a big variety of things ... and the one I decided to share is from my co-worker Mark, who was a body-builder back in the day. Here's one of the trophies he won - 1996 NPC Sacramento Bodybuilding Championship, overall winner, Masters Division (and yes, he looked exactly like the guy in the trophy!)

mark_trophy

And now, here are my top 10 blog posts, in chronological order. You can click the blog titles or the thumbnails to go to the actual blog posts ...

 

Timeline of Supreme Court Justices

scotus

This was a timely topic, since one of the justices recently died and the president will be appointing a new one. I really like the graphical representation I came up with, because it shows the balance between the liberal and conservative justices. It will be interesting to see if the balance shifts, or stays the same, after the new justice is appointed.

 

 

 

How to graph NBA data with SAS

nbaAfter the Moneyball movie came out a few years ago, sports teams have realized that analytics can give them a huge advantage. Historically, baseball has probably been the sport with the most data collected ... but in recent years, other sports like football and baseball have been increasing their data collection by leaps & bounds. The amount of data could be overwhelming, but a good graph can help you get a handle on thousands of data points on a single page. This post shows how to use SAS to create a custom graph that looks like a basketball court, and plot all the shots on it.

 

 

A graph fit for a Prince

princePrince's songs were really popular during my formative years, and I was sad to hear that he had died. Looking back, I was wondering when all his songs had been released, and plotting the data on a time-axis seemed like a good way to visualize the data. If you're a Prince fan, you'll want to check this one out - it's a cool discography, and also a nice tribute to Prince, The Artist!

 

 

 

 

What you see is what you get ... maybe!

koffkaThis is one of my most fun blog posts. I created several mind-bending examples, showing both the flexibility of SAS, and how the eyes and the brain don't always reach the same conclusion. The examples range from the classic Koffka rings, to a more obscure color-name test, to my own version of a viral Facebook trick graph that left a lot of people shaking their heads (literally!)

 

 

 

 

The big fat truth about the US weight problem

obesityIn the last couple of decades, the US has seen a dramatic increase in the number of overweight and obese people. I found a graph that tried to show this shift, but there were several problems with it. Therefore I demonstrated step-by-step how I thought it should be changed, and came up with (what I think) is a graph that shows the data really well.

 

 

Pokémon: Gotta graph 'em all!

pokemonThe new Pokémon game became really popular this year, after it came out as a smartphone app. Therefore when I found out that there was a huge collection of data available, I knew I had to graph it. In this post, I show several different ways to graph the various kinds of data. Much to my surprise, my Pokémon blog(s) even got a mention in the local newspaper. Even if you can't catch 'em all, you can still catch this blog post and find out how to graph 'em all!

 

 

 

 

91% of the US didn't vote for Hillary or Trump!

prelimWith this being a major election year, there was plenty of political data to graph. This blog post focuses on an aspect of the election that I hadn't really given much thought to before -- the Primary elections, where each party selects their presidential candidate. Many people complained that they didn't like either of the two candidates that were in the final election (Hillary & Trump) ... but according to this graph, it is quite likely that those people didn't even vote in the Primaries (and therefore it's their own fault!).

 

 

 

 

 

If we didn't start the fire, then who did?

fireI really like graphing data that is related to pop culture, and especially when it is data from 'my generation.' This example takes a Billy Joel song (We Didn't Start the Fire) that was popular when I was in school, and plots all the events mentioned in the song on a timeline graph. But not just any old timeline graph ... this one is shaped like a combination of an old-school vinyl record, and the Tempest video game from the 1980s. I hope you enjoy this graph as much as I enjoyed creating it!

 

 

 

 

 

A statistical crossword puzzle to exercise your brain

crosswordCrosswords puzzles were in the news this year, as clever analysts found that many of the puzzles were not actually unique, but copies (or partial copies) of ones already published in the past. I decided to create my own crossword puzzle, using words from statistics and analytics. It was a fun project, and also demonstrates that SAS software is flexible enough to do just about anything!

 

 

 

 

 

 

Which drinks have the most, and least, caffeine?

caffeineComputer people seem to have a reputation for drinking a lot of caffeine. I don't know if that reputation is totally true, but I can say that my favorite and most often used perk at SAS is the free soda machine in the break room, LOL! Therefore, when I found some caffeine data, I knew that I had to share it with my blog readers ... and what better way to do that, than in graphical form!

 

 

 

 

 

I hope you've enjoyed my 'top 10' list! If you'd like to see these 10 in the context of the other 80 blog posts I wrote in 2016, here's a graph showing them all. You can click the image below to see the interactive version of the graph, with mouse-over text (showing the titles) and drill down links (which go to the actual blog posts).

my_blogs_2016

 

So, how well did I do picking the right blog posts for the top 10? If I left out any that you feel should have made the list, feel free to mention them in a comment!

 

Post a Comment

Forecasting your next breakup!

Has anyone ever broken up with you, and left you thinking "Wow, I didn't see that coming!" In hindsight, maybe you could have seen it coming. At least from a statistical perspective. Let's dive into this topic with some lighthearted discussion, and plot some Facebook data...

When it comes to breaking up, there's a guy-talk joke that you should break up right before Thanksgiving, and then wait to start dating again until after Valentine's -- that way you don't have to go to all the family dinners with your girlfriend's family, and you get out of buying her several gifts, etc.

An interesting NPR article mentions another twist on the phenomenon of breaking up during the Thanksgiving holiday, calling it the "turkey drop season." No, a turkey drop isn't like the possum drop they do in Clay's Corner, NC for the New Year's Eve celebration! NPR explains that "The turkey drop is that holiday breakup season where all the college students return home for their first major vacation, and everyone breaks up" (this is generally freshmen who have been in college for a couple of months, and finally decide to break off the long-distance relationship with their pre-college sweetheart).

These were interesting theories (or urban legends?), but I wondered if real-world data would confirm, or refute, them? And what other breakup trends might the data show? ...

Several years ago, David McCandless teamed up with Lee Byron to create a graph of real data showing the timing of breakups throughout the year (see his book The Visual Miscellaneum, p. 179). The data came from Facebook's 2008 Lexicon service, which allowed you to specify keywords & phrases, and provided you with the frequency of those words in Facebook status updates over time. There's no explanatory text in his book (only the graph), but an article on nydailynews.com explains that the data is based on 10,000 Facebook users in 2008. Here's the graph:

breakup_graph_orig

It was a decent graph, but a little difficult to determine exactly what date peaks occurred on. Also, some of the labels ran together (Spring Break & Valentine's), and I think the extra text such as "spring clean" and "too cruel" cluttered the graph, without adding any additional insight.

So I decided to create my own version, using SAS Software. I wouldn't find McCandless' raw data anywhere, and Facebook no longer offers the free Lexicon service therefore I couldn't generate fresh data for the current year. Therefore I painstakingly went through his graph one point at a time, and estimated & transcribed a data value for each day of the year (yes, it was very tedious!) I then created the following graph:

breakup_graph

Here are some of my changes & improvements:

  • I added a better title, that better explains the data.
  • I added grid lines at the beginning & end of each month, so it's easier to estimate the dates.
  • I added grid lines along the y-axis, so you can more easily see if the data line is increasing or decreasing.
  • I shortened the label text, and simplified the lines connecting the text to the graph.
  • I labeled a few extra points along the graph, such as 'Superbowl' and 'Election Day'.
  • I added mouse-over text at each point along the plot, so you can easily determine the date at the peaks and valleys of the data line (click the image/snapshot above to see the interactive version with mouse-over text).

The graph does seem to concur that there are a lot of breakups between Thanksgiving and Christmas ... but it also shows a lot of breakups in February & March. It's an interesting graph, but for me it raises more questions than it answers. For example:

  • What age group were the Facebook users? I assume that in 2008, Facebook would have tended to have more younger users than today.
  • It would be interesting to see separate graphs for high school students, college students, and adults not in school.
  • What countries were the Facebook users in? I assume it was probably mostly US, but that might not be a valid assumption.
  • Is it possible that some of these were "false positives" where someone made a post about a celebrity breakup?
  • I assume the Lexicon keyword search was based on text posted to the users' wall, but it would have also been interesting to graph the Relationship Status (single, in a relationship, etc). In particular, it would be interesting to see what % of the users are single on each day throughout the year.

What other questions or suggestions do you have, for analyzing breakup data? Feel free to leave a comment!

 

Post a Comment

Ron Cody’s Tips on Using SAS University Edition

tips-on-using-sas-university-editionI often wonder how many people see the word "University" in the title "SAS University Edition" and think you have to be a university student to download this software. Please help me spread the word: Anyone can download the University Edition (as long as you’re using it for learning purposes) and the best part, it's FREE.

I just returned from the South Central SAS User Group meeting (in San Antonio) where I attended a talk by Ryan Lafler (Kirk's son). One of the first questions from the audience was "is the University Edition a watered-down, restricted version of SAS?"  The answer is no.  The SAS University Edition – let’s call it UE - is a complete version of SAS that even includes SAS Stat, SAS IML, and the Access product.  Is there a catch? Just one – you are not supposed to use the SAS University Edition for commercial purposes.

I could go on and on about how awesome it is that SAS offers this option for learners, but the title of this blog includes the word "Tips" so I better get going and tell you a few. First of all, to obtain your free copy of SAS, click here.

...or, even easier, just type "SAS University Edition" in Google or whatever search engine you use.

Here's another good tip: If you are using UE to learn SAS, follow the installation instructions and create a folder on your hard drive called "c:\SASUniversityEdition\Myfolders" (or the appropriate form on Apple or Linux) and place your data (such as Excel Workbooks) there.  Here's the reason:  The virtual environment is running Linux, where file naming conventions are not the same as those on Microsoft or Apple.  So, to communicate between your "real" computer and the "virtual" computer, you must set up a shared folder. (There are step-by-step instructions in the installation guides.) I should also mention that my two books, Introduction to SAS University Edition and Biostatistics by Example Using SAS Studio show you screen shots of every step in setting up UE and SAS Studio.

The last piece of information I want to share with you is how easy it is to use the Import Utility. It's even better than the Import Wizard that’s part of the Display Manager. You can import data from just about any source (Excel, Access, etc.) and for most files, you don't even have to specify the data source—the Import Utility will look at the file extension and, if it is one it recognizes (.xls or .xlsx, for example) it will perform the conversion automatically.

OK, a bonus tip. If you’re slightly older (such as myself), find a young person to give you a hand when you install the University Edition and SAS Studio! They can probably do it in half the time!

Post a Comment