In my last post I posed a question about machine-to-machine generated data and data validation. In the scenario I described, the data generated by one (of hundreds, if not thousands of) working sensors reports behavior that is outside of the defined expectations. That can mean one of a number of things. Here are three that rapidly come to mind:
- The reported value is correct and indicates flawed behavior of what is being measured. This is likely a business problem, such as an impending power outage, a health event requiring immediate medical attention, a sprung leak in a gas pipeline, etc.
- The reported value is incorrect and indicates flawed behavior of the sensor. A faulty sensor or monitor may intermittently or continuously generate invalid data. As long as no other failures exist in the system, this might be easy to isolate and remediate.
- The reported value is incorrect and indicates flawed behavior of the system. In this case, the sensors are working correctly but the network method for reporting and propagation may be introducing errors. This is much more of an insidious problem, difficult to isolate and eliminate.
The problem: how can you automatically tell which of these three situations is occurring? Answering that question implies that data values require validity and compliance monitoring across the data flow network, not just at the point of creation and the point of delivery! The upshot is that any time a value deviates from expectation, there must be a process for reviewing the root cause to determine whether there is a business problem or a system problem, and that in its own right is a challenge. However, when we recognize the potential massive data volumes, the need to rapidly determine whether an issue is a business one or a system failure becomes more acute when trying to actively monitor the huge data streams. Therefore, this suggests the need for a scalable yet elastic means for continuous monitoring of M2M data. More on this in future posts!