Hadoop is not Beetlejuice

In the 1988 film Beetlejuice, the title character, hilariously portrayed by Michael Keaton, is a bio exorcist (a ghost capable of scaring the living) hired by a recently deceased couple in an attempt to scare off the new owners of their house. Beetlejuice is summoned by saying his name three times. (Beetlejuice. Beetlejuice. Beetlejuice.) Nowadays […]

Post a Comment

Hadoop and big data management: How does it fit in the enterprise?

The other day, I was looking at an enterprise architecture diagram, and it actually showed a connection between the marketing database, the Hadoop server and the data warehouse.  My response can be summed up in two ways. First, I was amazed! Second, I was very interested on how this customer uses […]

Post a Comment

EMC and SAS redefine big data analytics with the data lake

Adoption of Hadoop, a low-cost open source platform used for processing and storing massive amounts of data, has exploded by almost 60 percent in the last two years alone according to Gartner. One primary use case for Hadoop is as a data lake – a vast store of raw, minimally processed data. But, in many ways, because […]

Post a Comment

Provisioning data for advanced analytics in Hadoop

The data lake is a great place to take a swim, but is the water clean? My colleague, Matthew Magne, compared big data to the Fire Swamp from The Princess Bride, and it can seem that foreboding. The questions we need to ask are: How was the data transformed and […]

Post a Comment

Using Hadoop: Emerging options for improved query performance

In my last two posts, we concluded two things. First, because of the need for broadcasting data across the internal network to enable the complete execution of a JOIN query in Hadoop, there is a potential for performance degradation for JOINs on top of files distributed using HDFS. Second, there are […]

Post a Comment

Using Hadoop: Query optimization

In my last post, I pointed out that an uninformed approach to running queries on top of data stored in Hadoop HDFS may lead to unexpected performance degradation for reporting and analysis. The key issue had to do with JOINs in which all the records in one data set needed […]

Post a Comment

Using Hadoop: Impacts of data organization on access latency

Hadoop is increasingly being adopted as the go-to platform for large-scale data analytics. However, it is still not necessarily clear that Hadoop is always the optimal choice for traditional data warehousing for reporting and analysis, especially in its “out of the box” configuration. That is because Hadoop itself is not […]

Post a Comment

What do Hadoop superheroes do now that Hallows' Eve has come and gone?

Great works of fiction are filled with dynamic duos. Sherlock Holmes and Mr. Watson. Rosencrantz and Guildenstern. And, of course, superheroes like Batman and Robin. On Thursday, Nov. 5 at 1 p.m. ET, two real-world Hadoop superheroes – Arun C. Murthy, co-founder of Hortonworks, and Paul Kent, vice president of big […]

Post a Comment

Big data, Hadoop, and the Internet of Things walk into a conference

The panel moderator looks out over the audience. It’s a large crowd. For the first time ever, Big Data, Hadoop, and the Internet of Things are appearing on stage together. The conversation has just begun, so let’s listen in for a minute. Big Data: “…and people have been trying to […]

Post a Comment

SAS high-performance capabilities with Hadoop YARN

For Hadoop to be successful as part of the modern data architecture, it needs to integrate with existing tools. This integration allows you to reuse existing resources (licenses and personnel) and is typically 60% of the evaluation criteria for integration of Hadoop into the data center. One of the most […]

Post a Comment