In my last post, I pointed out that an uninformed approach to running queries on top of data stored in Hadoop HDFS may lead to unexpected performance degradation for reporting and analysis. The key issue had to do with JOINs in which all the records in one data set needed
Uncategorized
Helping students to reason statistically is challenging enough without also having to provide in-class software instruction. “Practical Data Analysis with JMP, Second Edition” walks students through the process of analysis with JMP at their own speed at home, allowing faculty to devote class time to crucial or subtle statistical concepts
Unter dem Motto "Big Data - Einsatzerfahrungen, Entscheidungsprozesse, Effekte" trafen sich gestern 580 Big Data Verantwortliche und Interessierte aus allen Branchen in Hanau, um Erfahrungen auszutauschen und Impulse für Big Data Initiativen in dem eigenen Unternehmen mitzunehmen. Ich möchte meine Erkenntnisse in diesem Blog weitergeben und diese insbesondere bezogen auf
Financial institutions are mired with large pools of historic data across multiple line of businesses and systems. However, much of the recent data is being produced externally and is isolated from the decision making and operational banking processes. The limitations of existing banking systems combined with inward-looking and confined data practices
Small data is akin to algebra; big data is like calculus.
Last week I received a message from SAS Technical Support saying that a customer's IML program was running slowly. Could I look at it to see whether it could be improved? What I discovered is a good reminder about the importance of vectorizing user-defined modules. The program in this blog
From the pressures of a highly competitive marketplace to changing economic conditions, to the evolution of the distribution network - the challenges facing the hospitality are many and varied. In this video, SAS asked a panel of experts to share their views on the issues that will challenge the hospitality
I recently wrote about how to overlay multiple curves on a single graph by reshaping wide data (with many variables) into long data (with a grouping variable). The implementation used PROC TRANSPOSE, which is a procedure in Base SAS. When you program in the SAS/IML language, you might encounter data
BI ist Auslöser für heftige Diskussionen. War es immer, wird es noch lange sein. Aktuell stehen zwei Lager im Ring. Die beiden debattieren aber nicht darüber, ob das Glas Wasser halb voll oder halb leer ist. Nein, sie reden darüber, wie schwer die jeweilige Variante ist. Und das hängt,
The electoral battlespace for the upcoming general election in the United Kingdom is starting to take shape. Campaigners are busily debating the political landscape. They want to own the high ground that dominates areas that matter most to voters – the NHS and the economy. With an ageing population and
In the movie Big, a 12-year-old boy, after being embarrassed in front of an older girl he was trying to impress by being told he was too short for a carnival ride, puts a coin into an antique arcade fortune teller machine called Zoltar Speaks, makes a wish to be big,
Data. To a statistician, data are the observed values. To a SAS programmer, analyzing data requires knowledge of the values and how the data are arranged in a data set. Sometimes the data are in a "wide form" in which there are many variables. However, to perform a certain analysis
Despite an increase in the availability of data in the federal government over the past few years, data and analytics could be doing even more for federal agencies. A strategic approach to managing and analyzing the data is needed. And, like many technology challenges – that’s a people problem. A
"You show me a successful complex system, and I will show you a system that has evolved through trial and error." ~ Tim Harford TED Talk link: http://www.ted.com/talks/tim_harford Karl Marx died thinking that the first communist revolution would occur in Great Britain, driven by the long hours and unsafe
Die Data Science und die Nachfrage nach entsprechenden Experten hat gewaltig Fahrt aufgenommen. Aber bei näherer Betrachtung zeigt sich, dass es fast ebenso viele unterschiedliche Ausprägungen des Begriffs gibt, wie offene Stellen zu besetzen sind. Das zeigt unter anderem der Persönlichkeitstest, den wir gemeinsam mit einem englischen Psychologenteam entwickelt haben. Wir laden daher
This week’s author tip is from Robert Virgile and his book SAS Macro Language Magic: Discovering Advanced Techniques. Virgile chose this tip because even good programmer’s make errors. We hope you find this tip useful. You can also read an excerpt from Virgile’s book. Even good programmers make errors. In
Mobile World Congress is quickly approaching. Attendees and exhibitors are feverishly scheduling meetings, doing research, and determining their areas of focus to maximize their experience of the event. If you're hoping to learn more about big data analytics at the conference, here are some helpful insights and resources to help you
Sports provide us with many familiar clichés about playing defense, such as: Defense wins championships. The best defense is a good offense. Or my favorite: The best defense is the one that ranks first statistically in overall defensive performance, after controlling for the quality of the offenses it has faced. Perhaps not
After the legalization of recreational marijuana use in Colorado in 2012, it has been a much more frequent news topic than before - even from a data analysis perspective... I was recently looking for 'interesting' data to analyze with SAS, and I noticed some articles about the increasing potency of marijuana in
SAS procedures usually handle missing values automatically. Univariate procedures such as PROC MEANS automatically delete missing values when computing basic descriptive statistics. Many multivariate procedures such as PROC REG delete an entire observation if any variable in the analysis has a missing value. This is called listwise deletion or using
Smallpox was declared eradicated in 1979, after an extensive vaccination campaign in the 19th and 20th centuries. This blog post contains a visual analysis of the final years of this disease in the US ... In my previous blog post, I imitated and improved infectious disease graphs from a recent Wall
It's that time of year: Awards season. While we on the SAS Social Media Team will be happily following along this Sunday for the 87th Annual Academy Awards (via Twitter, naturally), we thought it only appropriate to use this as a time to celebrate our customers in social from 2014. From
Many states are starting to crack down on the serious abuses of government programs, cutting down on outright fraud as well as reducing abuses and errors. I wanted to highlight one of those, now that they've been on this path for a few years. North Carolina, where SAS is headquartered,
As the point person for SAS joining the new Open Data Platform (ODP) initiative, I want to make it clear why SAS is involved with ODP, and why we think it’s important to our customers, and the Hadoop and big data ecosystem as a whole. SAS is not in it to
Hadoop is increasingly being adopted as the go-to platform for large-scale data analytics. However, it is still not necessarily clear that Hadoop is always the optimal choice for traditional data warehousing for reporting and analysis, especially in its “out of the box” configuration. That is because Hadoop itself is not
The Institute of Business Forecasting's FVA blog series continued in January, with my interview of Shaun Snapp, founder and editor of SCM Focus. Some of Shaun's answers surprised me, for example, that he doesn't compare performance to a naïve model (which I see as the most fundamental FVA comparison). But he went
How many of us have heard or even said the phrase, "If it's not broke, don't fix it." While on rare occasions this may be the correct approach, it is a statement that stops innovation and creativity in its tracks. You might as well say, "Because we've always done it that
The role of insurance is to bring some predictability, manageability and stability to a chaotic and uncertain world. In essence, it is a risk mitigation tool. The role of the Chief Risk Officer (CRO) is to manage the overall risk strategy for the insurance company. They are responsible for defining
Charlie Chase is considered an expert in sales forecasting, market response modeling, econometrics and supply chain management. Now he's sharing some of his expertise in his Business Knowledge Series (BKS) course, Best Practices in Demand-Driven Forecasting. I had the chance to ask him some questions about his course and the
Tucked in the SAS Enterprise Guide Query Builder there is a text box unhelpfully labelled 'Options'. To find it select Options -> Options for this query -> General, and it is about halfway down the screen. I am going to show you how to use that text box to make your tables smaller, and how