What is blockchain and how can you analyze data in a blockchain? This article will discuss various forms of blockchain analytics from a tactical or heuristic perspective. I’ll explain how SAS® technologies can provide advanced analytics for operational, value/asset and regulatory viewpoints in the diverse world of open source blockchain technologies.

Blockchain landscape

Let’s start with a few basic viewpoints to set the ground work of our discussion.

Blockchain definition

A simple take on a blockchain is to think of it as a linked list of linked lists. As clients generate transactions, each transaction is collected in a linked list by a consensus process that updates a data store that is a linked list of immutable blocks. The security and integrity of the blockchain is guaranteed through built-in protocols and cryptographic algorithms.

Blockchains are growing in popularity because they offer a way to conduct transactions without the need for a trusted third party. Transferring money, tracking goods and sharing legal documents are common uses of blockchain technologies.

Types of blockchains

For the purpose of this discussion, blockchains will be viewed as either public or permissioned/private:

  • Public blockchains like Bitcoin are primarily found in the cryptocurrency world and offer anonymous or pseudonymous identity.
  • Permissioned blockchains for the most part are implemented behind company firewalls, are enterprise-ready and typically have known identity. Many proof-of-concept projects use permissioned blockchains. Examples include R3 Corda, Chain, BigChainDB, and Hyperledger – but there are many others.

Blockchain structure

Structure defines the operational components of a blockchain and mainly centers on a blockchain's data store. With the profusion of open source blockchain implementations, there are almost as many types of blockchain data structures. Many of the blockchain data stores are derivatives of other blockchain technologies. For example, LitecoinZCash, and Prova are based in various implementations of Bitcoin. Permissioned blockchains lean toward the use of a key/value data store such as LevelDB, RocksDB and MongoDB.

Accessing the blockchain

From our discussion so far we can derive two categories of data for all blockchains.

  1. The first category is data at rest, or data that already exists in a blockchain's immutable data store. In the case of Bitcoin all transactions from the beginning of Bitcoin are stored in its blockchain. There are many ways to access the immutable data store of a blockchain. For example, Python scripts and Base SAS have been used to export the entire Bitcoin blockchain into SAS data sets, offering a wide range of both regulatory and operational analytics. Transactions of interest may be considered for anti-money laundering (AML), know your customer (KYC) or fraud detection.
  2. The second and most interesting category is data in movement. This moves the collection point of data in event form to the processes of a blockchain. Adding event generation at various points in the client, miner/consensus and protocol processes of a blockchain, it is possible to provide stream-based, real-time analytics of any blockchain activity or blockchain content. This approach may also be more helpful in the case of fully encrypted blockchains.

Analyzing blockchain data on the fly

To demonstrate the power of capturing data in movement, better defined as a streaming approach, we developed a blockchain simulator using SAS Event Stream Processing. The simulator generates client requests into a miner process that are controlled by a consensus process. Both the simulator and consensus processes use the pub/sub APIs connected to the SAS Event Stream Processing model for managing blockchain updates.

Here is a workflow view of the implemented SAS Event Stream Processing model:

A blockchain miner/consensus process using SAS Event Stream Processing.

Operational blockchain analytics

The first streaming analytics produced using this method were operational in nature and included transactions per second, block updates per second, and total transaction times from creation to block update.

Adding a configuration window to the model provided a method to start, stop, pause and mute miners and dynamically change the blockchain update rate. Future enhancements will add deep learning at the miner and consensus process levels to automatically manage blockchain metrics such as block size and elapsed time. Running 30 miners at 850 millisecond blockchain updates were easily provided through the SAS Event Stream Processing engine. This is an ideal environment for performance in analyzing IoT projects.

What about analyzing data in a real, open source blockchain such as R3 Corda, Hyperledger or Chain? Well, once the processes for any blockchain are modified to generate the desired events, a SAS Event Stream Processing model similar (minus the consensus and configuration windows) to this simulator could be applied.

As blockchain technologies mature and IoT use cases become the bellwether for blockchain implementations, the need for higher speed block updates, processes and communications will trend toward stream-based composition. The demand for stream-based blockchain analytics technology, such as SAS Event Stream Processing, will prove instrumental to the overall success of blockchains.

Regulatory requirements and blockchain investigations

Public blockchains in the cryptocurrency space are under significant pressure to address topics such as AML, KYC and fraud. With the advent of initial coin offerings and surging market value of cryptocurrencies, regulatory pressures are increasing all over the world.

SAS Visual Investigator addresses these concerns with a variety of intelligence analysis and management needs. It can reveal suspicious activity while performing fraud, security and compliance investigations. One of the key features is to import various forms of data, then define relationships and user interfaces specific to the imported data.

For example, what if your money laundering investigation included someone with a known Bitcoin address? As a blockchain-based exercise, we created an investigation case exercise utilizing SAS Visual Investigator. Using blockchain.info APIs and Python scripting, all the transactions for the Bitcoin address were extracted and three levels of input transactions extracted. Using the transaction date, the Bitcoin price for that date was extracted from another web API to get the dollar value at the time the transaction was created.

Interestingly, that first extract included a relayed by IP address. Using an IP location finder, we identified a longitude and latitude based on the given IP address. The data was aggregated, combined and imported into SAS Visual Investigator. By simply dragging and dropping the data into our case, we were able to show a network diagram of transactions and users, as well as a geographical map of the activity.

Extracting and aggregating Bitcoin input transactions was the challenging part of this case. Due to the anonymity of Bitcoin addresses, other than the known address, only patterns, amounts and possibly the location information added value to the investigation. But with a little work, it’s possible to access and include a variety of data from various blockchain technologies using SAS Visual Investigator.

Find out more about SAS Visual Investigator

Future of blockchain analytics

Blockchain-based technologies will continue to expand into many industries and areas. The secure, decentralized essence of blockchain will make it a popular technology option for any system where security is important. From managing smart contracts to validating money transfers, expect to see many common uses of the technology.

As blockchain use increases, more organizations will need to access and analyze the data, even as it grows in complexity and volume. Moving forward, there may also be the need to offer analytics across multiple blockchain variants. I’m excited to work for a company that has anticipated the interest in blockchain technology and is already applying advanced analytics techniques in this evolving space.

Interested in blockchain analytics: Check out these blog posts

About Author

Sam Penfield

Advisory Solutions Architect

With 20+ years’ experience as a consultant and 30+ years developing software Sam has a true passion for programming languages and problem solving. Over the past 2 years Sam has worked in an emerging technologies group helping with integrating Spark with SAS, prototyping SAS product enhancements and over the last year has focused on Blockchain technologies and the integration of SAS from a data management, streaming and analytics perspective.


  1. Christian Giraud on

    Great write-up Sam! I am really impressed with the use cases showcased for combining Blockchain and analytics. I can only imagine this will expand in both known and unknown ways, much like the early days of the internet when we didn't know its potential.

  2. Stephen Sparano on

    Great article on the possible use cases for analytics with ESP and Blockchain together. We need this kind of focus on practical applications that are valuable for our customers, and areas like Fraud and security are great examples of where SAS can have an impact.

  3. Very timely article Sam. As concerns that cryptocurrencies are being used for money laundering, governments like UK, Russia, and the US are going to have to figure out a way to regulate and monitor cryptocurrency transactions. It will be interesting to see what 2018 brings on the regulatory front!

  4. Thank you for explaining this topic. Gave a very simple and detailed picture how AI and Analytics can be used in Blockchain. I would love to explore more. if you have a blog or so. I will be happy to read that.

  5. Sander Huysmans on

    Hi Sam, great summary. You make are able to make a complex subject understandable for the masses.

    Next step: A VI demo on bitcoin transactions from Wikileaks? 🙂


  6. Sam, very nice article, thanks. What are the key competitive advantages of SAS ESP-based blockchain analytics potentials comparing with using AWS Step functions, Lambda, Kinesis-suite of services, and Sagemaker DL services? Another question: What do you think about BaaS (Blockchain as a Service) available on Microsoft Azure?

    • Sam Penfield

      SAS ESP has all the SAS Analytics built in it needs. By adding event generation code in clients/wallets, miners, etc. you can provide real time analytics. It maybe possible to use assorted AWS services to accomplish a similar approach but seems more complicated. SAS ESP feeds SAS Viya and to me seems like a more complete solution. Both AWS and Azure offer BAAS. Many companies are modifying open source blockchains to implement their own use cases. They may host on AWS or Azure but once the modify the code it becomes their intellectual property. What the approach in my blog attempts to do is offer a way to integrated analytics into their blockchain implementations.

  7. Informative blog.I really enjoyed reading the key statistics you bring to our attention. Thank you for sharing this blog with us.
    good information and keep on sharing.

  8. Manuel Rodriguez on

    Very appreciated you work Sam, I will forward this blog to my local colleagues as I found it not only interesting but really useful. As a suggestion, in my opinion the exercise you created combining ESP+VI on blockchain info for the use case you mention could be great to document and package as a demo.

  9. Great article on the possible use cases for analytics with ESP and Blockchain together. I really enjoyed reading the key statistics you bring to our attention. Thank you for sharing this blog with us.

  10. Justin jeosep on

    Awesome blog! The practical approach of blockchain analytics was explained very nicely. It was very Interesting and Knowledgeable. Keep sharing, such a nice Information.

Back to Top