Overcoming the IT-Business Divide in an era of big data, Part 3

In the first part of this series, I recapped several key recent trends as they related to the IT-business divide. In short, thanks to the rises of cloud computing, big data and BYOD, IT is (generally speaking) less equipped than ever to act as the gatekeeper of enterprise data. In part two, I described how big data means that IT needs to act as a facilitator.

So where does that leave us – and how do we put a nail in its coffin once and for all?

There's no simple solution for solving the divide in an era of big data, but let me suggest three not-so-dangerous ideas. Read More »

Post a Comment

Apache YARN to become “the operating system for your data”

It’s been an amazing journey with Hadoop.

As we discussed in an earlier blog, Hadoop is informing the basis of a comprehensive data enterprise platform that can power an ecosystem of analytic applications to uncover rich insights on large sets of data.

With YARN (Yet Another Resource Negotiator) as its architectural center, this data platform now enables multiworkload data processing across an array of methods, from batch through interactive to real time. And it’s supported by the key capabilities enterprise data platforms need – governance, security and operations. Read More »

Post a Comment

Overcoming the IT-Business divide in an era of big data, Part 2

In the first part of this series, I described the new challenges that IT departments face today. Collectively, they make it unreasonable for IT to act as the traditional gatekeeper of enterprise information. That's not to say, though, that IT should just sit back and ignore the very data that employees use to make business decisions.

Far from it.

In this post, I'll describe how, more than ever, IT (or some equivalent entity) needs to act as a technology and data facilitator.

Read More »

Post a Comment

Overcoming the IT-Business Divide in an era of big data, Part 1

The IT-Business Divide is lamentably alive and well in many organizations.

You know what I'm talking about: that exhausting and inimical internal bickering between IT and everyone else about who's responsible for what. I would wager that thousands of intelligent articles, blog posts, studies and white papers have been written about bridging the traditional IT-business divide. (Thomas Redman penned a particularly good post for HBR a few years back.)

In the first of this three-part series, I'll examine this well-trodden issue against the backdrop of recent trends, particularly the rise of big data.

By way of background, I've seen first-hand the traditional IT-business divide on dozens of IT projects throughout my consulting career. Today, in many mature companies, that divide now resembles a growing chasm. Read More »

Post a Comment

Big data integration - A good starting point for data governance?

In the UK, technology trends move a little slower than for our US counterparts. It was about 5 years ago when I first met a data leader at a conference on this side of the pond who was actively engaging in large scale big data projects.

This wasn’t a presenter or big-name draw to the event. My "Big Data Scoop" was uncovered during a break-out coffee and danish session – a fertile ground for me to uncover new stories for Data Quality Pro. Read More »

Post a Comment

Data integration – Job skills required for success

Data integration, on any project, can be very complex – and it requires a tremendous amount of detail. The person I would pick for my data integration team would have the following skills and characteristics:

  1. Has an enterprise perspective of data integration, data quality and extraction, transformation and load (ETL):
    1. Understands data quality, data profiling and ETL tools.
  2. Understands the need for enterprise data management:
    1. Including data modeling for the enterprise and each data integration project.
  3. Understands database performance for load and retrieval of data:
    1. This should include indexing, partitioning and views.
    2. Reporting environment implementation.
    3. Propagating data to other systems (if necessary).
  4. Possesses the ability to write highly optimized SQL and/or consult with developer to achieve results:
    1. Once in a while we have to “roll our sleeves up” and help out.
    2. Code reviews and testing will be required.
  5. Participates in gathering and prioritizing the requirements:
    1. Collaborates in writing the scope, requirements and detailed technical document.
  6. Is a MASTER at spreadsheets:
    1. Mapping from one or more sources to a target requires documentation on the process, quality of the data, anticipated values and any other technical notes required for the data integration project.
  7. Works well with others in a complex, intense development environment.
  8. Possesses leadership skills:
    1. Working and delegating to other team members.
    2. Reporting progress to project managers and upper management (when required).
      1. PowerPoint is a MUST.

1439322506230Sounds like a SUPER person doesn’t it? Actually, data integration requires a super-person who understands the business needs and can articulate that information into technical documents. Not an easy job to fill, and it may take multiple people to accomplish these tasks. If you find someone with these qualities, hang onto this person. They are worth their weight in gold.

SAS is a leader in Gartner Magic Quadrant for data integration tools for the fifth consecutive year.

Post a Comment

Big data model convergence: Combining metadata and data virtualization as a collaboration tool - Part 2

I am currently cycling through a schema-on-read data modeling process on a specific task for one of my clients. I have been presented with a data set and have been asked to consider how that data can be best analyzed using a graph-based data management system. My process is to load the data, examine whether I have created the right graph representation, execute a few queries, and then revise the model. I think I am almost done with this process, except that as I continue to manipulate the model for analysis I notice yet one more thing about the data that I need to tweak before I can really start to analyze the data. Read More »

Post a Comment

Big data modeling: An iterative approach - Part 1

In my prior two posts, I explored some of the issues associated with data integration for big data and particularly, the conceptual data lake in which source data sets are accumulated and stored, awaiting access from interested data consumers. One of the distinctive features of this approach is the transition from schema-on-write (in which ingested data is stored in a predefined representation) to schema-on-read (where the data consumer imposes the structure and semantics on the data as it is accessed). Read More »

Post a Comment