In the 19th century, the harnessing of electricity brought about the means to transmit signals via electrical telegraph. The term STOP was used in telegrams to mark the end of a sentence because punctuation cost extra. Therefore, a telegram requesting an end to poor data quality would literally have been sent as “Stop Poor Data Quality STOP” — and if you think data quality wasn’t an issue in telegraphy, stop, and think again.
In his book The Information: A History, a Theory, a Flood, James Gleick recounted the 1887 story of Philadelphia wool dealer Frank Primrose, who telegraphed his agent in Kansas to say he had bought 500,000 pounds of wool. But when the message was received, a key word was misinterpreted, and his agent thought he was being instructed to buy wool. Before long the error cost Primrose $20,000, according to the lawsuit he filed against the Western Union Telegraph Company.
The legal battle dragged on for six years, until finally the Supreme Court upheld the fine print on the back of the telegraph blank, which spelled out a procedure for protecting against errors:
To guard against mistakes, the sender of a message should order it REPEATED; that is telegraphed back to the originating office for comparison . . . Said company shall not be liable for mistakes in . . . any UNREPEATED message . . . nor in any case for errors or obscure messages.
This brings to mind a more contemporary issue of data quality, namely where should the procedure for protecting against errors reside in our complex enterprise data management ecosystems. In other words, who is responsible for verifying data quality: the receiver or the sender? Using the telegraph example, the sender is responsible, albeit simply for verifying that the message was received as sent.
However, often it’s not feasible to push defect prevention back to the data source because it would disrupt operational systems, so data cleansing is performed downstream of the source, not only without passing reusable data quality rules upstream to prevent (or at least minimize) similar issues, but often without even notifying anyone that the data may have been altered after it was received.
Where Do You Stop Poor Data Quality?
Within your enterprise data management ecosystem, where is data quality addressed? And how are notifications about data quality issues, and their resolutions, transmitted throughout your organization?