Considerations and tips for big data integration


I like many things about writing for the Data Roundtable. Near the top of my list is the fact that I can actually see the number of page views for each of my articles. Make no mistake: this isn't always the case. For instance, I scribe for The Huffington Post but can't tell you my page-view counts. I couldn't tell you why the site doesn't provide this valuable information to its writers, but that's neither here nor there.

Getting back to the Data Roundtable, one of my most popular posts concerns big data integration. With more than 6,000 page views as of this writing, I know it has reached more people than most of my other blog posts. As for why, I'm only guessing, but I suspect that more and more readers are thinking about how to combine their traditional data sources with newer ones.

Against this backdrop, here are some considerations for how to integrate an increasing array of data sources.

Are you relying exclusively on ETL? Is that sufficient?

Don't get me wrong: extract, transform, load (ETL) is not going anywhere anytime soon. Far too many key business processes rely upon it for essential business functions. This is particularly true for large, mature organizations.

(See the full ETL diagram and more detailed explanation on Wikipedia.)


Still, by itself, ETL may no longer be sufficient for success in today's world. As Steve Putman writes, "streaming data, or data in motion, implies data that is not 'at rest' like data that has been captured and stored in a database."

Tip: Start looking at the viability of streaming data sooner rather than later. It isn't going away. In fact, its import will only increase in the coming years.

Are your dashboards only pulling internal, structured data?

Odds are that the answer here is probably yes. This has caused some in the business-intelligence community to openly question the value of traditional dashboards and reports. As Marius Moscovici writes in "Is the business intelligence dashboard dead?":

A sales team could be congratulating itself on a successful week not knowing that a product recall just hit the news, causing their biggest client to pull out of a deal. A customer support team could attribute a sudden increase in ticket volume to a news event, not knowing that a celebrity just scorned their product on TV.

I won't proclaim the death of the dashboard just yet, but it's downright silly to think that the world hasn't changed.

Tip: Just because a tool, report or dashboard worked in 1998 doesn't mean it still answers those same questions today. Don't be afraid to explore new tools and "blow up" or retire existing ones that no longer serve their original purposes.

Are your employees exploring new data sets and sources?

There's a natural tendency to think that every employee on social media is wasting his or her time – and your company's money. Of course this happens, but there's increasing business value to be gleaned from Twitter, LinkedIn, Pinterest, Facebook and other social media sites.

Tip: As I write in The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions, organizations ought to encourage a mind-set of data discovery.


What say you?

Learn more in this paper: How Streaming Data Analytics Enables Real-Time Decisions


About Author

Phil Simon

Author, Speaker, and Professor

Phil Simon is a keynote speaker and recognized technology expert. He is the award-winning author of eight management books, most recently Analytics: The Agile Way. His ninth will be Slack For Dummies (April, 2020, Wiley) He consults organizations on matters related to strategy, data, analytics, and technology. His contributions have appeared in The Harvard Business Review, CNN, Wired, The New York Times, and many other sites. He teaches information systems and analytics at Arizona State University's W. P. Carey School of Business.

Related Posts

Leave A Reply

Back to Top