All Posts
Ole Man Winter has taken his sweet time getting out of here this year challenging even the most complacent weather watchers. But, last night I finally heard it - the beautiful trill of the Spring Peeper frogs. If you’re not from North Carolina or parts nearby, you may be unfamiliar
Sometimes I get contacted by SAS/IML programmers who discover that the SAS/IML language does not provide built-in support for multiplication of matrices that have missing values. (SAS/IML does support elementwise operations with missing values.) I usually respond by asking what they are trying to accomplish, because mathematically matrix multiplication with
The SG Procedures do not support creating a 3D scatter plot. GTL has some support for 3D graphs, including a 3D Bi-variate Histogram and a 3D Surface, but still no 3D point cloud. The lack of such a feature is not due to any difficulty in doing this as
Today in manufacturing there has been a lot of investment in automation, supervisory controls, quality, and execution systems. The amount of data produced and now being captured is staggering. The data captured in industry will re-define what is “big” in big data. Yet, for all this investment: Equipment still fails. Scrap
In my last two posts, we concluded two things. First, because of the need for broadcasting data across the internal network to enable the complete execution of a JOIN query in Hadoop, there is a potential for performance degradation for JOINs on top of files distributed using HDFS. Second, there are
“When it comes to the Internet of Things, the future clearly belongs to the Things”. I made this brash statement in a previous post (“Cloud encounters of the Fifth Kind”) referring to machine-to-machine (M2M) being the fastest growing component of non-human traffic on the Web. I say “brash” because that
In my previous post, I talked about how a bank realized that data quality was central to some very basic elements of its initiatives, such as know your customer (KYC), customer on-boarding and others. In this blog, let’s explore what this organization did to foster an environment of data quality
I often blog about the usefulness of vectorization in the SAS/IML language. A one-sentence summary of vectorization is "execute a small number of statements that each analyze a lot of data." In general, for matrix languages (SAS/IML, MATLAB, R, ...) vectorization is more efficient than the alternative, which is to
For Parent’s Weekend this year, I needed to choose a restaurant for dinner in my son’s college town. Our extended family was attending the college football game and spending the weekend with our son. Before making my decision, I searched the internet for all the restaurants located within a reasonable
You may be intrigued to know how the average person compares to a gold medal winning Olympic athlete when it comes to things like height, body mass, resting heart rate, arm span, body fat etc. Or, perhaps more frightening, how you measure up? I know this will resonate with my
Public educators have increasingly been told to produce the “workforce of the future.” States are striving for alignment between what students learn and the jobs that ultimately will be available to them. This alignment is critical for students so they have the right skills and knowledge to excel at college
When I saw Robert Kosoro's cool ZIPScribble map, I knew I had to create a SAS version - and of course I had to add a few enhancements along the way.... I was perusing some of the examples on dadaviz.com, and Kosoro's ZIPScribble map caught my attention. It wasn't a particularly useful
One of the common traps I see data quality analysts falling into is measuring data quality in a uniform way across the entire data landscape. For example, you may have a transactional dataset that has hundreds of records with missing values or badly entered formats. In contrast, you may have
You might be surprised at how many movies and TV shows are made in North Carolina - especially within the last few years. This blog provides a SAS graph that will make the list of films even easier to read! A recent story by the Tar Heel Traveler, and an exhibit
Seguramente ya ha escuchado hablar sobre Hadoop y todas sus potentes capacidades, de no ser así, este sistema no es más que un marco para software de código abierto que permite almacenar y procesar grandes volúmenes de datos de forma distribuida en un gran número de productos de hardware. En