What does it cost to tackle big data? Do you need to make a major investment to build a high-performance architecture? The short answer is no. For a longer explanation, I’m going to start with a question from a comment on one of my earlier posts.
On the post, What kind of big data problem do you have? a reader asks:
You mention that the four types in the quadrant each do different things and they each require different architectures. Could you please elaborate that a little? And what should be the best way for a company to move from lower right quadrant to upper left quadrant considering the costs? Would appreciate your insight.
The first part of the answer is that you should think about what it is you’re trying to accomplish with your decision making, and what challenges you are trying to overcome with your infrastructure. Ask yourself what is driving your move from one stage to the next:
- Is it the fact that information’s not accessible?
- Are you moving from analyzing past activity to predicting future activity?
- Or are you just taking too long to process data and finding it hard to get results in a timely fashion?
If you are satisfied with your BI architecture but realize that your data is growing and you need quicker response times, then you should look at architectures to support quicker extraction of data. Believe it or not, upgrading your architecture to increase throughput and improve speed to decision making is no longer an expensive proposition.
The architectures are actually similar in the lower right, upper left and upper right quadrants. The lower left is a more traditional BI architecture: You have a database and a query and reporting tool that sits on top of it. You may get reasonable response times but as your data grows and you want to reduce response times, you’re moving to the lower right quadrant. To cut the response time, you have to start looking at significant changes in your architecture.
Today, when we talk about upgrading your architecture for these purposes, we’re not talking about expanding mainframe capital, building more databases or adding large UNIX servers. Instead, we’re talking about a blade architecture based on commodity hardware that is relatively inexpensive. With that change, increased performance by 1,000 times is not unusual.
If you want to start small but see tremendous gains in performance, there’s an affordable entry point in the lower right and upper left quadrants. That is the single-box approach based on a symmetric multiprocessing environment. You can have 50 users on a $10-20,000 box. It’s not a matter of jumping in all the way and spending a tremendous amount of money.
To decide between those two quadrants, you should look again at what your business is trying to accomplish. Do you want to get a jump on the competition by predicting future aspects of the business quickly? That’s where big analytics and big data analytics can provide the most benefit. They can offer predictive results in seconds instead of hours. When you’re looking for these types of answers, it’s not just a hardware or software issue. You also have to look at whether or not the business is capable of supporting that activity. Modelers and data scientists are needed. You can do it yourself, or host it in the cloud and partner with a vendor to develop and maintain the models. It’s the same blade architecture in the cloud and you can work with the vendor to bear the burden of the operational activity as opposed to buying and maintaining the hardware and models yourself.
The bottom line is that today’s architectures are making it easy to start doing big analytics in a relatively inexpensive way. You can start small, prove value and then work on spreading the results quickly throughout the organization. From there, you can begin to demonstrate what’s possible and build the groundswell to see what’s capable more broadly with big data and big analytics throughout the organization.