My blog posts focus on visual data analysis, and many of them use geographical maps. Therefore I hope you will have fun with a quick geography quiz, which I created using SAS/Graph ... And what, you might
Uncategorized
Occasionally a SAS statistical programmer will ask me, "How can I construct a large correlation matrix?" Often they are simulating data with SAS or developing a matrix algorithm that involves a correlation matrix. Typically they want a correlation matrix that is too large to input by hand, such as a
A week from today, we'll be in New York City for Strata + Hadoop World, where we’ll kick things off at the Opening Reception. Be sure to stop by booth 543 to meet the team IRL (in real life)! They are excited about the event and eager to talk with attendees.
A new version of SAS® Text Miner and SAS® High-Performance Text Mining has recently been made available and I want to demonstrate some of the performance improvements that can be gained with this release. I’ll use a topic analysis that discovers the main themes in a document collection and consists
It’s rather appropriate that the rock band Europe recorded the hit “The Final Countdown”, because today, September 22nd, represents 100 days until the much anticipated (and delayed) European insurance legislation Solvency II will come into effect on January 1st 2016. Designed to introduce a harmonized, EU-wide insurance regulation, Solvency II
At California Polytechnic State University, San Luis Obispo the Statistics Department offers two courses on preparation for the Base SAS Certification and Advanced SAS Certification exams, respectively. Each of these courses is 10 weeks long and the topics covered follow the content offered in the certification guides offered by SAS.
These days many devices (such as smart phone apps, Fitbits, Apple watches, dog tracking collars, car gps, hiking gps, teen/car trackers, etc) can track your location, and provide you with standard/canned ways to analyze the data. This blog post shows how I created a custom SAS map of the tracking
.@philsimon on the new challenges of an old problem.
A Vermont Department of Children Families (DCF) worker was murdered last month. The lead suspect is the mother of a child that was previously removed from her care and placed in foster care. This tragedy illustrates the challenges and risks that workers have in the field of serving at risk
Dear Rick, I have a data set with 1,001 numerical variables. One variable is the response, the others are explanatory variable. How can I read the 1,000 explanatory variables into an IML matrix without typing every name? That's a good question. You need to be able to perform two sub-tasks:
Do you want to know what will happen in the future? To gain true predictive insight, skip the tea leaves and look toward your data. SAS instructor Jeff Thompson is a high-energy data mining expert who will be demonstrating how to gain predictive insight from your data in his new
It’s me again!! We're at the halfway point of meeting our Strata + Hadoop World dream team. So far, you’ve met machine learning guru Patrick Hall; data management expert Clark Bradley; and advanced analytics specialist Rachel Hawley. Next up … Dan Zaratsian! I met Dan a few years back while preparing for Analytics 2013
Last weekend I realized I wanted something very specific: a great-looking lawn by mid-October that would require minimal effort from me. This meant that the output I needed to produce this desired outcome was the target date on which I needed to have aeration and over-seeding done in my yard.
Even though the first papers in machine learning were in the 1950s, one could argue it goes back further to the work of Alan Turing and other early computer scientists. So why has this way of modeling seemingly become so popular now? Because data has become a commodity. Large amounts of many different
I read an article recently discussing how runners inevitably slow down with age, particularly after 50. Data from the New York Marathon and Boston Marathon back this up with generally flat average finishing times for ages 20-49 followed by a steady, almost exponential, increase after 50. I haven’t reached the
I just returned home from an expedition/adventure boat trip to Cuba, and Talk Like a Pirate day is coming up this Saturday - what a combination for an interesting blog! I hope you enjoy a few pictures, and a bit of data analysis on these topics! A couple of weeks ago,
In the UK, technology trends move a little slower than for our US counterparts. It was about 5 years ago when I first met a data leader at a conference on this side of the pond who was actively engaging in large scale big data projects. This wasn’t a presenter
A core SAS team of thought leaders, developers and executives will be in New York City on September 29 at Strata + Hadoop World, mixing and mingling with people like you who live and breathe analytics. We’d love to be a part of your Strata + Hadoop World agenda. Last week, I introduced
Among the tightly held cards, piles of chips and bright lights, there have been stories that have unfolded in Las Vegas that have been forever preserved in time, never seeing the light of day. But what if what happened in Vegas…could be shared with excitement with your friends and family?
Last week I discussed ordinary least squares (OLS) regression models and showed how to illustrate the assumptions about the conditional distribution of the response variable. For a single continuous explanatory variable, the illustration is a scatter plot with a regression line and several normal probability distributions along the line. The
Data integration, on any project, can be very complex – and it requires a tremendous amount of detail. The person I would pick for my data integration team would have the following skills and characteristics: Has an enterprise perspective of data integration, data quality and extraction, transformation and load (ETL): Understands
I've previously written about how to generate a sequence of evenly spaced points in an interval. Evenly spaced data is useful for scoring a regression model on an interval. In the previous articles the endpoints of the interval were hard-coded. However, it is common to want to evaluate a function
In today’s information economy, the ability to engage and develop meaningful digital relationships is fundamental to any business. A growing number of organisations, including small-to-medium sized enterprises, are investing in easy-to-use analytical software and services to extract insights from data about their business. As a result, we're now experiencing the
with Natalie Osborn, Senior Industry Consultant, Hospitality and Gaming Practice, SAS This week, we continue our fall “back to the basics” refresher series on analytics for hoteliers. Last week, in part one, Natalie and I reviewed the analytic methods that can be utilized by hoteliers. This week we will explore
Meet Clark Bradley: SAS technical architect by day and comedian by night. When he’s not demoing SAS Data Loader for Hadoop, he’s blogging about it on The Data Roundtable. Clark and a core SAS team of thought leaders, developers and executives will be in New York City on September 29 at Strata
The benefits of big data often depend on taming unstructured data. However, in international contexts, customer comments, employee notes, external websites, and the social media labyrinth are not exclusively written in English, or any single language for that matter. The Tower of Babel lives and it is in your unstructured
I had the pleasure of speaking at the inaugural “Accounting IS Big Data” conference this past week in New York City, a meeting organized by the American Accounting Association. In addition to giving several talks, I participated in breakout sessions in which attendees discussed how analytics is used to monitor
A friend who teaches courses about statistical regression asked me how to create a graph in SAS that illustrates an important concept: the conditional distribution of the response variable. The basic idea is to draw a scatter plot with a regression line, then overlay several probability distributions along the line,
The epicenter of big data moves to New York City on September 29 at Strata + Hadoop World. It’s a great chance to mix and mingle with people that live and breathe analytics, including a core SAS team of thought leaders, developers and executives. We’d love to be a part
There are four key areas that require continuous investment in order to become demand-driven: people, process, analytics, and technology. However the intent of your demand forecasting process along with business interdependencies need to be horizontally aligned in order to gain sustainable adoption. Adoption alone doesn't necessarily mean it will be sustainable. As