"Traditional vendors to fare poorly with big data", not so for SAS...

David Linthicum wrote a blog post entitled "3 winners, 3 losers in the move to big data"  on InfoWorld and notes that traditional vendors "did not see this coming" (big data that is). Since David made some interesting points, some of which I agree with, some I disagree, I felt it worthwhile to provide my perspective. Here is my response to his blog post...

Hello David –

It’s been a while since we connected – our discussions about SOA, data federation, and database connectivity seem like such a distant memory! I’m glad that you are addressing big data and analytics topics in the context of cloud – I find your comments to be interesting, if not somewhat provocative.

Although I agree that the big data phenomena will be disruptive to some traditional vendors… at SAS, we are seeing a huge benefit based on the interest in big data.

  • The hype has helped communicate the fact that analytics is not just about BI and reporting. In addition to historical reports and dashboards, analytics can be used to view things predictively and can be used to optimize operations based on those predictions.
  • We are also seeing a better understanding of the benefits of integrating operational and analytic systems. Since big data is being driven by additional transaction data as well as contextual data (such as social media), people are starting to think about them together. We talk about the entire data to decision lifecycle where analytics are embedded directly in operational or transactional systems. For example, analyzing every single credit card swipe with rich analytics, or leveraging analytics to determine the best call script to drive a CRM-based interaction with salesforce.com
  • Companies that are already leveraging advanced forms of analytics such as predictive analytics and optimization, are now able to leverage new forms of data and build more robust analytical models. They are able to develop and run these models in a fraction of the time that it used to take, which means they can run additional “what if” models, factoring in additional variables while analyzing entire datasets. We now have retailers that are able to optimize pricing at the individual sku and store level with high frequency vs. applying costly mark-downs at the category and region level. We have banks that can better optimize their risk portfolios because they can analyze all of their customer data, and all of their transaction data, along with social media data – without the need to sample the data. We are supporting intelligence and law enforcement efforts by supporting an innovative “stream it, score it, store it” approach that leverages rich analytics to decipher the 1% of relevant data that streams through their organization up front. Analytics are applied at the front end of the information continuum vs. storing and then analyzing.

From a technical perspective, we have long since taken the approach to leverage any database technology that is available, regardless of the license type. We have taken a very aggressive stance in terms of developing new technologies – big data is simply not a new concept for us – we have leveraged distributed technologies such as grid, we’ve long since moved the processing to the data with our in-database approach, and our latest advancement is leveraging in-memory. Our in-memory approach is different from database vendors since we leverage an in-memory analytical engine – this in-memory approach is built specifically for analytics vs. data storage, and can be leveraged by the same infrastructure that supports the databases, including EMC Greenplum, Teradata, and Hadoop. In addition, our high performance analytics capabilities support multiple architecture patterns – from visualization capabilities that support a large number of users to high-end analytic modeling work that not only accommodates big data, but allows for the efficient management of many, complex analytical models.

Since you touched on Hadoop, and since it is all the rage, our high performance story also leverages and supports Hadoop. As with other databases, we support the ability to 1) leverage Hadoop data in any of our analytical products and 2) manage data that is in Hadoop using our data management solutions, which include data integration, data quality, MDM and data governance capabilities. We support the ability to author Hadoop code in HDFS, Hive, Pig, and MapReduce in our graphical development environment, and it is possible to create job flows that mix processing capabilities from SAS as well as Hadoop. In addition, we use Hadoop as the persistent storage mechanism in our Visual Analytics product – this enables fast loading of data into memory which is used by a visual tool to instantaneous present millions of rows of data. This capability also supports mobile display through devices such as the iPad.

Our cloud business is leading the way in terms of our revenue growth – again, this is nothing new. SAS OnDemand has provided a cloud based option for customers many years. This includes the ability to leverage SAS solutions as well as our business analytics capabilities that span information management, business intelligence and analytics. SAS customers have the flexibility to turning the entire operation over to SAS, or leverage the SAS infrastructure while still being involved in analytical modeling, data preparation, etc.

You may consider SAS a traditional vendor, but we actually see the big data trend as industry hype that is catching up to what we have done all along. We look forward to leveraging Hadoop and other emerging technologies as vehicles that will help us improve our clients ability to make decisions that drive competitive advantage.

Thanks,

Mark Troester
IT/CIO Thought Leader & Strategist, SAS

Twitter: @mtroester

Post a Comment

Alignment enables analytic success

Analytics Infrastructure: Vision & Strategy Consideration #1 (Part 1 of 15 considerations for Analytics Infrastructure)

In a perfect world, the entire organization would be aligned, and the analytics vision would be driven by top down, executive leadership. Since we aren’t living in a perfect world, it often takes work to drive alignment and convince people that the investment of time, effort and money will pay dividends. The good thing is that there is a perfect storm relating to analytics – you could argue that big data hype is overblown, but the positive aspect of big data is that it is driving interest in analytics. Whether it’s the oft-sighted McKinsey report, the Hollywood hit Moneyball, or federal funding of big data projects, or the desire to cash in on social media, the impact at the business level is undeniable. It’s not just technical Websites or publications, but business journals and Web properties are also focused on the impact of business analytics and big data. That hype is providing executive level impetus that in some cases is leading to top-down motivation and alignment. If that is not happening in your organization, that doesn’t mean that you are doomed to failure. There are many organizations that are driving success from the bottom up or success that is initiated at the project level. This can include a single project success that leads to executive level exposure – executives see the results and realize that analytics can be replicated across multiple business disciplines.

Analytic Success Requires Alignment

Analytic Success Requires Alignment

So what can IT do to help drive alignment?

Read More »

Post a Comment

Analytics Infrastructure: 15 Considerations

I recently presented with Jessica Dunn from Bank of America at the SAS Global Forum Executive Conference. Our presentation addressed the considerations necessary to build and manage an effective analytics infrastructure. Although we both worked on our presentations separately before we had a chance to discuss teh session, we built a similar story. That story is that even though technology is important, infrastructure is much more than technology. It’s a combination of people, process and technology, with data at the center.

Analytics Infrastructure = People, Process & Technology

Hung LeHong, Gartner Research VP set the stage as the keynote presenter at the SGF Executive session. Hung spoke about the future role of analytics and highlighted the fact that IT needs to play a critical role.

To build off of this introduction, my presentation focused on providing a set of design considerations for the analytic infrastructure. I provided 15 considerations, that span strategy & vision, people, process, technology and data/information. In the coming days, I plan to blog about each of these considerations.

Analytics Infrastructure Considerations

 

Vision & Strategy

People

  • It’s not just “Business” and “IT”
  • Get the right people: Don’t skip on training
  • Leverage Center of Excellence (COE) principles

Process

  • Understand the analytics lifecycle
  • Ultimately it’s about improving the business process
  • Strike the right balance between control and user flexibility

Technology

  • Leverage Enterprise Architecture principles to ensure proper design
  • Think big! Big data & big analytics: High Performance Analytics is key
  • Integrate SAS into the overall IT infrastructure

Data / Information

  • Design data strategy that results in information as a strategic asset
  • Leverage comprehensive Information Management approach
  • Step up to data preparation: Free up the scarce analytic resources
Post a Comment

Hadoop's Potential to Rewrite Data Management

Well, it's certainly a provocative title, and hopefully it will be a thought provoking conversation. I am participating in a panel discussion along with Philip Russom of TDWI, David Menninger of EMC, and James Markarian of Informatica. The discussion will be hosted by DM Radio hosts Eric Kavanagh and Jim Ericson.

The interview occurs this Thursday, May 3rd at 3 PM EST.

You can register here: http://www.information-management.com/dmradio/-10022392-1.html

If you can't make it for the live event, the session will be recorded.

If you have questions, you can send them via Twitter with a hash tag of #DMRadio

Hope you can join!

Thanks, Mark.

Post a Comment

Big data quality: Think outside the box

In my last post I set the stage for data quality considerations for big data. Today, I’ll cover the following big data and data quality considerations:

  • Data quality efforts should be "fit for purpose"
  • Extend data quality by thinking “outside the box”

Data quality efforts should be "fit for purpose"

Your data quality approach should be designed with several factors in mind – it doesn’t make sense to apply one data quality approach for all data or information related projects. You should consider where the data came from, how the data will be used, how the data will be consumed, who will use the data, and perhaps most importantly, what decisions will be made with the data. Here are several considerations relative to big data:

Consider the type of data:  The data quality requirements for different forms of data will vary and your approach should match the needs of the data. For example:

  • Big data projects that relate to traditional forms of data like transaction data related to key entities like customers, products, etc., can leverage existing data quality support as long as it scales to meet the needs of massive volume.
  • Big data relating to machine or sensor data (e.g., RFID tags, manufacturing sensor data, telco, utilities, etc.) will not be prone to input error that affects data that is entered by humans but as additional sensor information comes on line, it could be that sensors are emitting invalid data. Assuming that you trust your machine or sensor data, data quality related to discovery, the ability to link data with other systems, the ability to enrich data may still be extremely important.
  • Social media data such as Twitter, Facebook, etc., is similar to machine data in that the data quality issues resulting from user input, overlapping systems, etc., will not be the primary issue. It’s also important to note that there is a structured component to this information – structure around a Tweet stream relative to meta-data description along with the text string that contains the content of the tweet. So, this will involve a combination of entity matching, monitoring to ensure that the tweet stream is not interrupted along with the ability to analyze the text, which will bring in data quality considerations related to text data.

Not all analysis requires exactness: If you are attempting to identify a general pattern and you have a lot of data, the extraneous data is not likely to impact the overall conclusion. For example, if you have a massive amount of clickstream data and you are looking for patterns (where people leave a site, which path is more likely to result in purchase or conversion, etc., the outliers will not impact the overall conclusion. In this case, it’s more of an analytics process vs. a data quality process – data quality will not be in question, but relevance will – for example, if someone accidentally ends up on your website, they aren’t really part of the population that you are concerned with (unless you are analyzing why they are there in the first place). Same with bots vs. actual users, bot traffic is not likely to be erroneous, but it is possible to extend your data quality efforts to include relevance as a quality. Same with types of users – actual customer behavior vs. competitor traffic, etc., it’s a segmentation topic not a data quality topic.

Don’t cleanse away analytical value:  Risk - Outliers may actually indicate a risk or breach – unusual transactions should not be cleansed away because they fall outside of the norm, they may represent fraud. Instead of using anomaly detection to determine data quality issues, use anomaly detection to identify meter problems, potential fraud, etc.

Design the data quality process to map to the various stages of data usage: Processes up front in the analytical lifecycle like data discovery, data exploration, opportunity identification, data relationship research, etc., are better performed on the data prior to any cleansing taking place. For example, assessing the value of the various attributes by analyzing access frequency, detecting outliers or discovering correlations between attributes may form the initial stages in understanding data distribution. Then once it is clear about the questions that you are driving towards, the type of analytics that will be leveraged, etc., you can make the proper determination about data quality, etc. You may even leverage a gradual cleansing process as part of your strategy.


Extend data quality by thinking “outside the box”

The data quality discipline has matured rapidly over the last several years. Even with these advancements there are opportunities to leverage data quality principles in new ways. And it is possible to leverage analytics to make subjective decisions based on content that cannot be supported by a limited view of data quality. So thinking outside of the typical approach, here are some initial considerations relative to big data:

Extend data quality or monitoring capabilities to the analytical modeling process: Use data quality mechanisms to determine missing the impact of missing attributes on analytic algorithms. And, use data quality rules and monitoring capabilities to assess the accuracy of the model over time. For example, assess the potential degradation of analytics performance by measuring and alerting based on analytical model drift.

Using analytics to assess quality:  With contextual data, mechanically based data quality is not sufficient. For example, organizations should be looking to extend their quality efforts to assess social data that has been self-reported. Information that people self-report about medication taken, time spend studying, etc., is often misrepresented by the user (they intentionally fabricate the amount of meds taken, time spent studying, etc.). In this case, traditional data quality approaches will be insufficient, but analytics can be used to provide some level of value assessment. Same with sentiment data, considering transactional data and sentiment data relating to purchase behavior – if sentiment is negative and purchase behavior is positive, this could indicate a data quality problem, or it could relate to the customer being locked in without additional choices. Either way it will take further analysis that will not be addressed by mechanical data quality efforts.

Use data quality capabilities to assess collection level of machine data: Is your data collection process reliable? Does the data represent the proper time frame? Is there something about the data that signals that the collection is missing data. For example, if data is missing from a device, it could represent a problem with the data trail, or it could show that the device was off-line and not generating data. Consider extending your data quality approach to help determine whether the reported data indicates a problem with the sensor infrastructure.

Use data quality to ensure summarization efforts are valid: If the system leverages a summarization technique as a mechanism for dealing with extreme volume, consider applying data quality approaches to the summarized data as a means to validate the summary. This could be used in situations where a device or distributed component summarizes the data that is being returned, or for data processing that functions like MapReduce that summarize data for downstream processing or analysis.


Check back tomorrow for my final post on big data and data quality. I’ll cover the following topics:

Post a Comment

Big data quality - don't tell me, another buzzword!

Marketing is a big part of my job so I should be supportive of efforts to capitalize on the trend of the day. But given my background in R&D, I am dubious of marketing efforts that are not backed up by real product or solution capabilities. So, I’m a bit of a skeptic about vendors that have inserted big data in all of their marketing messages. I see a lot of talk about what organizations should be doing, what is now possible given the opportunity of big data, etc. But I see very little in terms of guidance – how things should be done, or information relating to best practices that can guide the average organization.

So, I’ll step up to the challenge and provide some initial thoughts relating to data governance and big data. My initial posts will address data quality aspects of big data, and while I’m not going to provide a complete set of best practices, I’ll provide a list of considerations that should be factored into your big data plans.

I’ll start with my main takeaways, and then provide a set of recommendations related to data quality and big data. After considerable thought and discussion with product experts, implementation consultants and analytics practitioners, my key takeaways include:

  • Big data is all the more reason to leverage a comprehensive information management approach, one that is not just focused on data quality and data integration, but utilizing a comprehensive information management approach that spans data, analytics and decision management.
  • When it comes to big data, it’s not just about volume. As evident with data quality, many of the considerations are specific to the types of data that are being processed, or dependent on the source of data or the business use case. You don’t have to have massive volumes of data to leverage these data quality considerations.
  • As with other information initiatives, data quality should be considered as part of your overall data strategy including data governance and MDM, but I’ll take these topics one at a time, starting with quality.

I wish I could say it is as simple as extending your existing data quality approach to big data. Certainly, if you have solid data quality and data governance processes and technologies in place, you are at a great starting point. And simply extending what you are doing to include big data will provide some benefits. But to be truly successful, you need consider aspects of big data that may require a different perspective. On Monday and Tuesday, I'll these data quality considerations summarized by the following statements:

Post a Comment

SAS Hadoop - A peek at the technology

Thanks for returning to learn more about this critical technology. Following yesterday’s overview post on the new SAS Hadoop support, we’ll dig a little deeper today and consider the following:

  • Under the Hood: A Peek at the Technology
  • SAS Hadoop Value Summary
  • A Note About the Future

Under the Hood: A Peek at the Technology

Bring the power of SAS® Analytics to Hadoop

The SAS/ACCESS Interface to Hadoop offers seamless and transparent data access to Hadoop via HiveDB. SAS users access Hive tables as if they were native SAS data sets. Analytic or data processes can be performed using SAS tools while optimizing run-time execution using the appropriate Hadoop or SAS environment.

The SAS/ACCESS Interface to Hadoop enables Hadoop users to tap into the power of SAS by extending support for the complete analytics life cycle to Hadoop, including discovery, data preparation, modeling and deployment. Of particular importance to many organizations is the ability to:

  • Visually analyze or explore data in Hadoop as the precursor to more in-depth analytics via SAS Visual Analytics Explorer capabilities.
  • Leverage text mining and analytics capability based on data stored in Hadoop.
  • Use SAS Metadata Server to create and manage metadata relating to data that is stored in Hadoop.

Technical Details

  • LIBNAME statement makes Hive tables look like SAS data sets.
  • PROC SQL provides the ability to execute explicit HiveQL commands in Hadoop.
  • SAS procedures (including PROC FREQ, PROC RANK, PROC REPORT, PROC SORT, PROC SUMMARY, PROC MEANS and PROC TABULATE) are supported.

Leverage Hadoop’s Distributed Processing Capability

SAS Hadoop support allows execution of Hadoop functionality, enabling MapReduce programming, scripting support and the execution of HDFS commands from within the SAS environment. This complements SAS/ACCESS capabilities provided for Hive by extending support for Pig, MapReduce and HDFS commands.

More Technical Details

  • PROC HADOOP support allows you to submit MapReduce, scripting and HDFS commands from the SAS execution environment. This includes support for Pig, MapReduce and HDFS commands.
  • External file references are supported, which provides the ability for Hadoop files to be referenced from any SAS component. Parameters necessary to process the file, such as delimiters, are externalized, which makes it convenient to work with a Hadoop file.

Augment Hadoop using SAS® Information Management

One of the issues plaguing Hadoop is the lack or relative immaturity of tools that can be used to develop and manage Hadoop deployments. SAS data management and analytics management offerings can help organizations quickly derive value from Hadoop using fewer resources. Some examples of this include an intuitive graphical user interface to develop Hadoop capability, the ability to create data management and analytic code and deploy it within Hadoop, or the ability to register and manage Hadoop files via the SAS Management Console. This makes it easy to work with Hadoop within SAS, and extends SAS metadata, data lineage, impact analysis and security capability to Hadoop environments.

Still More Technical Details

SAS® Data Integration Studio

  • SAS Data Integration Studio includes a set of standard transforms and a job flow builder that can be used with Hadoop data. The transforms support common functionality, such as the ability to load, unload, extract, reformat, read/write multiple files, reference external files, etc.
  • SAS Data Integration Studio provides the ability to integrate Hadoop code, including Pig, MapReduce and HDFS commands in-line with a data job flow.
  • SAS Data Integration Studio provides an editor for Pig and Hive, which provides visual editing capability, including a syntax checker, for developing Pig and Hive.
  • SAS Data Integration Studio provides the ability to submit HiveQL via PROC SQL capability that can also be surfaced through Base SAS and other SAS components.
  • Since Hadoop is treated as a SAS data source, data quality capabilities that are provided by SAS and DataFlux can be leveraged to process data that is coming in or out of Hadoop.

Hadoop Function Support

  • SAS provides the ability to create UDFs that can be deployed within HDFS. This includes the ability to use SAS Enterprise Miner to take analytical scoring code and produce a UDF that can be deployed within HDFS. These UDFs can then be accessed by Hive, Pig or MapReduce code.

Metadata, Lineage & Security

  • Using Hadoop within SAS provides the benefit of data lineage (including impact analysis) and additional security. All SAS processing that is done with Hadoop is tracked, and the existing data lineage functionality can be used to better manage Hadoop usage.
  • Ability to register Hive Server using SAS Management Console so that any SAS capability can easily reference Hadoop (via a FILENAME statement, leverage parameters to better interact with Hadoop, identify delimiters so files can be parsed on the fly, etc.). This makes it possible for the entire SAS stack (BI, DI, SAS/STAT, etc.) to work with Hadoop data. It provides the ability to track what tables are in Hadoop, and provides the basis for lineage.
  • SAS honors the underlying security provided by Hadoop. For instance, SAS will not bypass Hadoop security and allow a user to read data without the proper Hadoop permissions. In addition to the underlying security provided by Hadoop, SAS will allow you to further restrict access to Hadoop based on the standard SAS security capabilities.
  • The SAS Metadata Server, a component of Base SAS software, provides the ability to generate metadata based on data that is stored in Hadoop. SAS provides flexible parsing support that is not restricted to a preset data definition, allowing support for any custom definition. Once defined, the metadata can be used to optimize interaction with the data stored in Hadoop.

Environment Support

  • Support for popular Hadoop distributions such as Cloudera, Hortonworks, EMC Greenplum, etc.

 

SAS® Hadoop Value Summary

The SAS approach marries the power of world-class analytics with Hadoop’s ability to leverage commodity-based storage and Hadoop’s ability to perform distributed processing.

The SAS Hadoop integration provides the following value to organizations looking to get the most from their big data assets:

  • SAS both simplifies and augments Hadoop. An ability to abstract the complexity of Hadoop by making it function as another data source brings the power of SAS and its well-established community to Hadoop implementations. This is critical, given the skills shortage and the complexity involved with Hadoop. In addition, boosting Hadoop with world-class analytics, along with metadata, security and lineage capabilities, helps ensure that Hadoop will be ready for enterprise expectations.
  • SAS provides total Hadoop leverage. Because SAS support for Hadoop spans the entire information management life cycle, SAS management supports metadata, lineage, monitoring, federation and security augmentation. These areas are pervasive through the entire data-to-decision life cycle.

How do enterprises benefit from the distinctive SAS Analytics and SAS Data Integration offerings?

  • SAS provides a robust, comprehensive, information management life cycle approach to Hadoop that includes data management and analytics management support. This is a huge advantage over other products that focus primarily on moving data in and out of Hadoop.
  • SAS delivers optimal solutions for each organization’s specific mix of technologies. SAS Data Integration supports Hadoop alongside other data storage and processing technologies. This offers greater flexibility than other vendor-specific products that only use Hadoop as a vehicle for landing more information on certain database or hardware platforms.

A Note About the Future

The exciting news is that this is just the start – we'll be discussing additional topics, such as data governance and Hadoop, MDM and Hadoop, SAS embedded processing on Hadoop nodes and other topics of interest to the SAS community. Please check back to hear more about how to best build your information assets.

Let the SAS Hadoop hype continue!

Post a Comment

SAS: Big play for Hadoop

Hadoop – it’s not just hype! The community has shown tremendous interest in our plans for Hadoop – what will be supported, when it will be available, and so on. We’ve been blogging about big data and provided early plans for Hadoop, including SAS/ACCESS support for Hadoop. Well, it's official: SAS support for Hadoop is now available, and it's generating a lot of excitement.

In addition to customer interest, we’ve also received positive feedback from numerous analysts that were briefed on our product plans. At our recent analyst conference in Steamboat, CO, the following tweets from Forrester’s James Kobielus sum up the reception there:

What is the basis for Forrester saying that SAS is a big player in the Hadoop market? In this blog post and another to follow tomorrow, I will show how SAS Hadoop plans offer value to enterprises looking to use big data to produce valuable insights. I'll provide an overview of our plans for Hadoop, including thoughts on how the SAS approach is different from what other vendors are providing. This post and the follow-up tomorrow are lengthy (can you tell I'm excited?). Here are the topics that I'll cover in each:

This post:

  • SAS Big Data Analytics (It’s Not Just Hadoop)
  • Introduction to SAS and Hadoop
  • SAS Support for Hadoop – Available Now

Tomorrow:

  • Under the Hood: A Peek at the Technology
  • SAS Hadoop Value Summary
  • A Note About the Future

 

SAS Big Data Analytics (it’s not just Hadoop)

Long before the term big data was coined, SAS customers applied complex analytical processes to large data volumes. SAS was designed from the beginning to scale and perform well in any environment, and was also designed so to take advantage of new complementary technologies. Over the years, the SAS user community has evolved with new technologies relating to performance and scalability, including grid, in-database and in-memory technologies. The users benefited from great performance regardless of the platform, including optimization for SMP desktop implementations, MPP and grid deployments. A rich variety of SAS technologies, including high-performance analytics, event processing and others, are available and appropriate for those enterprises with big data environments.

And organizations rely on SAS to do much more than manage data, looking for SAS to provide a complete analytics life cycle that can be applied to big data. This extends their ability to work with a massive number of models, massive complexity in terms of the types of models, and a massive number of variables. They have come to depend on SAS to support a complete data scenario, and the ability to identify the most relevant data for processing or analysis based on a stream, score and store model.

Introduction to SAS & Hadoop

Customers can pair their SAS technologies with products from a number of leading database and data warehouse vendors so that world-leading analytics can be used with these popular data store technologies. The same approach is now extended to the Hadoop environment, an open-source distributed file system that leverages commodity hardware. Hadoop is capable of storing and processing massive amounts of data. Support for Hadoop extends customers' big data abilities and complements their existing data strategies.

Although there are a lot of technical details involving the various Apache subprojects and Hadoop-based capabilities, SAS support for Hadoop can be boiled down to a simple statement:

Hadoop data can now be leveraged using SAS. The power of SAS Analytics has now been extended to Hadoop.

Just like other data sources, data can be consumed across the SAS stack in a transparent fashion. This means that analytic tools already in place (such as SAS Enterprise Miner), tools relating to data management (such as SAS Data Integration Studio), and foundation tools (such as Base SAS) can be used to work with Hadoop data. What you have now becomes even more valuable.

 

SAS Support for Hadoop - Available Now!

SAS Hadoop support Hadoop provides the following summary benefits:

Please check back tomorrow when we'll take a look at some of the technology details in the SAS Hadoop support, summarize the value and consider the future.

Post a Comment

Privacy bargain and big data security

I recently presented a session on big data at the 13th Annual Privacy and Security hosted by the Province of British Columbia and held in Victoria. There were a number of interesting discussions and presentations that relate to privacy and security ramifications of big data. The discussion was timely given the recent news about privacy and big data implementations, including coverage by the New York Times. Although I understand the desire to leverage a hot topic like big data to drive interest, I feel most of the security and privacy considerations are more appropriately driven by the business discipline that is being utilized. For example, privacy and security policies and approaches related to “one to one” marketing or personalization are not a big data issue, they are a customer management issue. In this context, striking the correct balance between leveraging personal information to provide better service vs. being too invasive needs to be hammered out by marketing, legal, IT, etc. While the approach needs to be extended to accommodate big data, it’s not a big data security issue per se.

Read More »

Post a Comment