Blend, cleanse and prepare data for analytics, reporting or data modernization efforts

.@philsimon says that it's never too early to think about the IoT and data management.
Blend, cleanse and prepare data for analytics, reporting or data modernization efforts
.@philsimon says that it's never too early to think about the IoT and data management.
There’s no such thing as a free app. “What?” I hear you say, “but I download free apps all the time!” So then why do organisations spend considerable time and effort creating free apps? Often their goal is to collect data and turn it into money. Consider this example. There’s
Small causes can have large effects; or how a discovery in the Barnett Shale can spike some interest in the rest of the world and change the face of the industry. This article is co-written by Sylvie Jacquet-Faucillon, Senior Analytics Presales Consultant, SAS France; and David Dozoul, Senior Adviser
It's a common problem in any industry: getting a large number of similar requests for information. But with limited resources and an already overburdened staff, how do you handle it? At El Paso Community College, analysts from the Institutional Research (IR) team enlisted the help of IT to create a data
In my last post we started to look at two different Internet of Things (IoT) paradigms. The first only involved streaming automatically generated data from machines (such as sensor data). The second combined human-generated and machine-generated data, such as social media updates that are automatically augmented with geo-tag data by
These: Zum Daten-MAN-gen braucht man Software. Hat man mehr Daten, nimmt man mehr Software… „Mehr vom selben!“ – jener Kulturtechnik aus der Ecke „viel hilft viel“. Stimmt das so? Man kann dies‘ knifflige Thema von verschiedenen Seiten beleuchten – im Folgenden einige mehr oder minder ernste Ansätze dazu:
Over the years I have written many blogs about insurance fraud including those on anti-money laundering, data quality in fraud, anti-fraud technology, life insurance fraud and even ghost broking. It’s clear that insurance fraud comes in many shapes and sizes and as losses continue to grow, detecting and preventing fraud
.@philsimon on what to do when the data breaks bad.
When something comes between you and your customers – like not having product in-stock or providing offers that aren't relevant to the customer – it causes delays and makes it harder to complete transactions in a satisfying way. But, an inexpensive sensor, beacon or Radio Frequency Identification (RFID) tag placed
The concept of the internet of things (IoT) is used broadly to cover any organization of communication devices and methods, messages streaming from the device pool, data collected at a centralized point, and analysis used to exploit the combined data for business value. But this description hides the richness of
As we look at the last 40 years of innovation using analytics, it can be both humbling and inspiring. I mean, who would have anticipated 40 years ago that SAS® would be used to analyze genomic data and help develop specialized medications as a result? Who would have guessed that
.@philsimon on whether big data and analytics offer true guarantees.
Some people think motorcycle racing is a sport for thrill seekers and adrenaline junkies. So you might be surprised to hear that after many years of racing, I’ve found it to be a contemplative activity more related to precision and prediction than reckless abandon. In motorcycle racing, the driver has
Throughout my long career of building and implementing data quality processes, I've consistently been told that data quality could not be implemented within data sources, because doing so would disrupt production systems. Therefore, source data was often copied to a central location – a staging area – where it was cleansed, transformed, unduplicated, restructured
A soccer fairy tale Imagine it's Soccer Saturday. You've got 10 kids and 10 loads of laundry – along with buried soccer jerseys – that you need to clean before the games begin. Oh, and you have two hours to do this. Fear not! You are a member of an advanced HOA
As I explained in Part 1 of this series, spelling my name wrong does bother me! However, life changes quickly at health insurance, healthcare and pharmaceutical companies. That said, taking unintegrated or cleansed data and propagating it to Hadoop may only help one issue. That would be the issue of getting the data
While it’s obvious that chickens hatch from eggs that were laid by other chickens, what’s less obvious is which came first – the chicken or the egg? This classic conundrum has long puzzled non-scientists and scientists alike. There are almost as many people on Team Chicken as there are on Team
.@philsimon on the specific risks to data quality posed by cloud computing.
While the growth of big data is an issue that preoccupies both the public and private sector, it’s also high on the UK government’s legislative agenda. All eyes are currently on the authorities, as we wait with bated breath to see what’s next in the quest to manage data escalation.
Does it upset you when you log onto your healthcare insurance portal and find that they spelled your name wrong, have your dependents listed incorrectly or your address is not correct? Well, it's definitely not a warm fuzzy feeling for me! After working for many years in the healthcare, pharmaceutical and
I'm frequently asked: "What causes poor data quality?" There are, of course, many culprits: Lack of a data culture. Poor management attitude. Insufficient training. Incorrect reward structure. But there is one reason that is common to all organizations – poor data architecture.
Analistas y expertos en Big Data de todo el mundo coinciden en la importancia de potenciar el capital humano y desarrollar profesionales más preparados. Cada persona tiene aptitudes para realizar diversas actividades como natación, equitación, tenis o, incluso, destacar en el ámbito profesional ofreciendo mejores resultados en tareas determinadas. Ahora
Many data quality issues are a result of the distance separating data from the real-world object or entity it attempts to describe. This is the case with master data, which describes parties, products, locations and assets. Customer (one of the roles within party) master data quality issues are rife with examples, especially
If a picture is worth a thousand words, then visualizing data in Hadoop would be like a billion. Over the last few years, organizations have rushed to leverage the low-cost distributed computing and storage power of Hadoop clusters. As Hadoop environments mature and move away from their initial focus of
Even the most casual observers of the IT space over the last few years are bound to have heard about Hadoop and the advantages it brings. Consider its ability to store data in virtually any format and process it in parallel. Hadoop distributors, such as Hortonworks, can also provide enterprise-level
@philsimon on what we can learn about data quality from Jeff Bezos's behemoth.
As the big data era continues to evolve, Hadoop remains the workhorse for distributed computing environments. MapReduce has been the dominant workload in Hadoop, but Spark -- due to its superior in-memory performance -- is seeing rapid acceptance and growing adoption. As the Hadoop ecosystem matures, users need the flexibility to use either traditional MapReduce
At a recent TDWI conference, I was strolling the exhibition floor when I noticed an interesting phenomenon. A surprising percentage of the exhibiting vendors fell into one of two product categories. One group was selling cloud-based or hosted data warehousing and/or analytics services. The other group was selling data integration products. Of
Streaming analytics is a red hot topic in many industries. As the Internet of Things continues to grow, the ability to process and analyze data from new sources like sensors, mobile phones, and web clickstreams will set you apart from your competition. Event stream processing is a popular way to
In the past, we've always protected our data to create an integrated environment for reporting and analytics. And we tried to protect people from themselves when using and accessing data, which sometimes could have been considered a bottleneck in the process. We instituted guidelines and procedures around: Certification of the data