In the first installment of this series on Hadoop, I shared a little of Hadoop's genesis, framing it within four phases of connectivity that we are moving through. I also stated my belief that Hadoop has already arrived in the mainstream, and we are currently moving from phases three of connecting people to phase four
English
In response to my recent post about how to use the PALETTE function in SAS/IML to generate color ramps, a reader wrote the following: The PALETTE function returns an array of hexadecimal values such as CXF03B20. For those of us who think about colors as RGB values, is there an
One of the hottest trends today in the business intelligence and analytics spaces is “self-service.” The word self-service is thrown around lightly in many situations and often carries different expectations for different people and organizations. Before we go into the details of self-service analytics it is important to have a
Report design includes several phases. Granted, these phases aren’t official: they’re more a reflection of my own thought processes and how my report designs typically unfold: the initial “get the data on the screen to see what we have” phase the addition of filters and prompts to assist with guided
Is the type of car you drive more likely, or less likely, to get a speeding ticket? Let's analyze some data to find out! Do red cars attract more attention from the police, and get more tickets? How about cars with a 'racing stripe'? Or cars with a big chromed motor,
Double, double toil and trouble; Fire burn, and caldron bubble. Macbeth, Act IV, Scene I For the cyptanalyst or recreational puzzle solver, "double double" does not lead to toil or trouble. Just the opposite: The occurrence of a double-letter bigram in an enciphered word puzzle is quite fortunate. Certain double
Have you heard the old saying that "Banks only loan money to people who don't need it"? Let's analyze the data and see if that is true!... I'm very much a car-guy, and I love learning about all the new vehicles, and love the new-car feel ... and even the smell. It's hard to not like a
So, you've heard the Hadoop hype and you are looking – or have already invested – into Hadoop. Maybe you have also realized some benefits from the Hadoop ecosystem. But now you want to maximize those benefits by using advanced analytics, or you might have heard about algorithms or machine learning libraries available
Shades of Pink - Honoring Breast Cancer Awareness Month was written by Celeste Cooper-Peel. Some things you never forget. I remember my mom’s breast cancer diagnosis like it was yesterday. She was only 42. Who would have thought? My mom found the lump herself and although it was cancer, she was
So, with the simple introduction in Understanding Hadoop security, configuring Kerberos with Hadoop alone looks relatively straightforward. Your Hadoop environment sits in isolation within a separate, independent Kerberos realm with its own Kerberos Key Distribution Center. End users can happily type commands as they log into a machine hosting the
Have you ever looked as a statistical graph that uses bright garish colors and thought, "Why in the world did that guy choose those awful colors?" Don't be "that guy"! Your choice of colors for a graph can make a huge difference in how well your visualization is perceived by
Would you build a house without a proper foundation? Most of us wouldn’t dare, but that’s exactly what many retail businesses are doing today. When building a house, if you don’t get the foundation right, paint, wallpaper and fixtures won’t matter much. It’s no different in the retail industry. Success
Last year, after 15 years of benefiting from the SAS community, I thought it was time to give a little something back. So I decided to write a paper on two technologies I have a healthy interest in: SAS and Hadoop. My paper SASReduce: an implementation of MapReduce using BASE/SAS
In a previous article I introduced the HEATMAPCONT subroutine in SAS/IML 13.1, which makes it easy to visualize matrices by using heat maps with continuous color ramps. This article introduces a companion subroutine. The HEATMAPDISC subroutine, which also requires SAS/IML 13.1, is designed to visualize matrices that have a small
A HighLow plot is very popular in the financial industry, often used to track the periodic movement of a stock or some instrument or commodity. The CandleStick Chart is one specific type of high low plot, purportedly originating in Japan for tracking of financial instruments in the rice trade. Creating a
If you live in an English speaking country you are used to a relatively unadorned alphabet. Take a look at the French and Spanish languages, where vowels are decorated with accents like “acción” in Spanish, and the circumflex, or the hat used in “pâte” in French. Look at the gorgeous
People often talk about the customer experience and the engagement model. This is an easier task when a business has regular interactions with its customers like banks and retailers. However, for insurers, this is a challenge. First of all, insurers have infrequent interactions with their customers. When there is interaction,
It’s a great time to be a sports fan – and an even better time to be a sports and data fan as these two worlds continue to meld together. For the last couple of years nearly every conversation about sports, analytics or both had to have at least one
A student brought in this coding problem after her manager was struggling with this issue for a while. They played guessing games, but to no avail. Here’s what happened when they submitted data step and proc sql code using a WHERE clause with an INPUT function? data aileen; length hcn
In last week's article about the distribution of letters in an English corpus, I presented research results by Peter Norvig who used Google's digitized library and tabulated the frequency of each letter. Norvig also tabulated the frequency of bigrams, which are pairs of letters that appear consecutively within a word.
As you can tell from my recent posts (see here and here), I've been working with SAS and Microsoft Excel files quite a bit. I'm really enjoying the ability to import an XLSX file in my 64-bit SAS for Windows without any additional setup. After one long afternoon of back-and-forth
The first time I used the Internet it blew my mind. As a diplomat brat, at any point in time everyone I knew was everywhere but where I was. Thanks the miracles of Gopher, Veronica, IRC and email, the tyranny of distance didn’t seem so oppressive any more. When I
A challenge for you – do a Google search for “Hadoop Security” and see what types of results you get. You’ll find a number of vendor-specific pages talking about a range of projects and products attempting to address the issue of Hadoop security. What you’ll soon learn is that security
Do you crave sugar? For me, the answer is "Yes"! I was born with a sweet tooth. I call it "The Beast". What I have learned about The Beast over the years is the more sugar I feed it, the more it wants. I used to think, "Hmm, that’s interesting.
While at JSM 2014 in Boston, a statistician asked me whether it was possible to create a "customized bin plot" in SAS. When I asked for more information, she told me that she has a large data set. She wants to visualize the data, but a scatter plot is not
The rumors, flying for many moons now, have turned out to be true. Followed by U2's new album release, Apple announced the launch of the Apple Watch for early 2015. Apple has finally unveiled its first foray into wearable technology. The Apple Watch (yep, not the iWatch), is an Apple
If you're a SAS user, chances are you're a bit of a science/technology/engineering/math nerd -- and also a fan of The Big Bang Theory. Therefore this SAS analysis on The Big Bang Theory should be right up your alley! Yesterday (September 22) was the start of the 8th season for the TV series,
Let's get one thing straight: I'm no wuss. Well, at least *I* don't think so. But on September 3-5, 2014, I gladly joined ranks with over 400 WUSSes descending on the Fairmont Hotel in downtown San Jose for the Western Users of SAS Software (WUSS) Educational Forum and Conference. It
We live in a world of acronyms, or rather TLAs, and SAS user group names are renowned for them. Last week I received a comment about one of the Australian user group names, and it got me thinking how did these names come about? What is their history? and to share
The skewness of a distribution indicates whether a distribution is symmetric or not. A distribution that is symmetric about its mean has zero skewness. In contrast, if the right tail of a unimodal distribution has more mass than the left tail, then the distribution is said to be "right skewed"