A recent trend in data democratisation, where an organisation makes data accessible to all employees and educates them on working with data, has revealed some significant gaps.
A big part of the employees uses data via some reporting/analytics frontend application, e.g., Power BI or Tableau, into which the data has travelled a long way from the systems where data has been created. During this travel, the data was cleansed and transformed into another form with several technical and business rules. These rules may exist on several levels and layers in the data architecture, from source systems to consuming applications.
The end user using a consuming application in the form of a report, dashboard etc., can only see the result after data has passed through all these rules and layers in the architecture. The usual questions will arise: “Where has this data come from?” or “How has it been calculated”? Or “What does this data represent?”. Also, if the figures are different from what is expected, finding the root cause for the error is either extremely difficult or even impossible without consulting the service provider of the IT system.
This is a black box syndrome – no one knows what happens under the hood. This causes a lot of confusion and manual work when troubleshooting the issues. This can also lead to Safety issues in AI development if the data is not well understood by the Data Scientists or “citizen” developers. It could add bias to the decision.
Usually, the attempts to create transparency in the data occur in the technical documentation provided by the system integrator. This documentation may not be accessible to all business users or is not in the format they could understand.
Transparency into the data
Some good practices to bring transparency into the data in the solutions are, for example:
Data Journey Map
The data journey map aims to depict the journey on a high data flow level and detail-level field-to-field mappings, including the business rules for data transformations.
Data Sampling Sites
The data sampling sites act as peepholes into the business and data processes. They provide views of data along the travel from the source of origin until the consuming applications.
Metadata
Metadata carries valuable information, for example, about the origin of data and how the data has been altered.
Documentation
Documentation in business language may be done using familiar tools, e.g. Word, PowerPoint, Excel or with the help of Data Catalogs, which automate the manual work and provide even more valuable information about the data assets.
More detailed information about techniques and practices will follow in the upcoming blogs.