A recent trend in data democratisation, where an organisation makes data accessible to all employees and educates them on working with data, has revealed some significant gaps.
A big part of the employees uses data via some reporting/analytics frontend application, e.g., Power BI or Tableau, into which the data has travelled a long way from the systems where data has been created. During this travel, the data was cleansed and transformed into another form with several technical and business rules. These rules may exist on several levels and layers in the data architecture, from source systems to consuming applications.
The end user using a consuming application in the form of a report, dashboard etc., can only see the result after data has passed through all these rules and layers in the architecture. The usual questions will arise: “Where has this data come from?” or “How has it been calculated”? Or “What does this data represent?”. Also, if the figures are different from what is expected, finding the root cause for the error is either extremely difficult or even impossible without consulting the service provider of the IT system.
This is a black box syndrome – no one knows what happens under the hood. This causes a lot of confusion and manual work when troubleshooting the issues. This can also lead to Safety issues in AI development if the data is not well understood by the Data Scientists or “citizen” developers. It could add bias to the decision.
Usually, the attempts to create transparency in the data occur in the technical documentation provided by the system integrator. This documentation may not be accessible to all business users or is not in the format they could understand.
Transparency into the data
Some good practices to bring transparency into the data in the solutions are, for example:
More detailed information about techniques and practices will follow in the upcoming blogs.