For many years, data governance was an acknowledged need in financial institutions—but a need that had very few strong sponsors or buyers. As a result, two problems emerged:
- Enterprise-level data governance (DG) was made the responsibility of relatively weak functions that had little pull compared to profit centers and their program-level needs.
- Data management (DM) was often seen as a solution in search of a problem: a collection of tools and practices that nobody in the organization felt they needed to concretely apply.
This situation led to almost a decade in which DM was often seen as an ivory-tower function whose aims were laudable but whose roadmap, tooling, and funding were vague. Then the BCBS239 data standard for risk aggregation came out of the Basel Committee on Banking Supervision, and European banks, at least, were forced to invest in data management and DG. Still, the result was often a one-off documentation exercise that could be very expensive in terms of staff time, while yielding few benefits other than a veneer of regulatory compliance.
In the last four or five years, however, use cases for DG have begun to mature and solidify in many organizations—driven by a mixture of business, regulatory, and technical factors. This trend has included the emergence of better-defined buyer personas for DG—organizational roles who can be relied on to realize that they need DM and (often) to fund it. The result is a gradual transition from DM as an aspiration to DM as an increasingly well-defined set of use cases. And once we have well-defined use cases, it becomes much more useful to look at cost/benefit, process, and tooling.
Buyer Personas and Their Use Cases
Particular buyer personas within the organization have emerged as key drivers of DG use cases and thus DG uplift. While these personas have always existed, the clarity and importance of their DG needs has increased significantly in recent years. We’ll review some of the most interesting personas here—and to demonstrate that DM has well-defined uses far beyond rules such as BCBS239 and the European Union’s General Data Protection Regulation (GDPR), we won’t even touch on external regulatory compliance needs!
Product Manager
The Product Manager is completely on the business side and has no interest in technology or data per se. The Product Manager is a business leader looking to offer a new financial service—a new fund, for example—and for that they need the right data quickly. Ten years ago, the Product Manager would have reached out to myriad siloed contacts within the organization, likely drawing on their personal network; the expectation now is that there will be a data catalog in which they can at least find the right points of contact. The Product Manager needs information about specific classes of data—“Who owns our Treasury yield curves? Where are sanctions lists kept?”—and they need to find this using business vocabulary, not by searching among column and file names.
The Product Manager primarily drives data discovery use cases—including not only the cataloging and searching capabilities, but also the development of taxonomies to describe the types of data they are interested in.
Market Surveyor
The Market Surveyor adds value to the enterprise by understanding customer, competitor, and regulator behavior across markets. For example, the Market Surveyor may be involved in real-time regulatory disclosure or in real-time customer surveillance; in either case, the Market Surveyor needs not only to make decisions based on data, but to take into account that those decisions will be visible to the marketplace and will drive customer and regulator decisions.
The Market Surveyor often drives streaming and real-time data use cases, which the rest of the DG community does not always remember to prioritize. Issues of data timeliness, completeness, and latency (as opposed to just quality) will also be important to them.
Program Data Architect
The Program Data Architect is tasked with creating change at a physical level, and with the successful delivery of the data aspect of change programs. They need not only to find data, but to perform impact analysis (to ensure the changes they are making are safe for the organization) and root cause analysis (as part of testing and productionizing their changes).
The Program Data Architect needs information at a more granular level than the Product Manager, and the need for impact analysis and root cause analysis drives lineage and traceability use cases.
Third Line of Defense
It’s increasingly acknowledged—not least in the wake of high-profile fines—that DG is a key element of a financial institution’s risk profile. So far, however, it has not been an easy one to audit; even when program-level DG is done superbly, it is not easy for an oversight function to prove that DG was done well, or to identify material gaps in data management that might still exist. The Third Line of Defense persona, then, is a senior stakeholder responsible either for exercising oversight or for ensuring that the right decision-making information is available to the oversight and audit functions.
The Third Line of Defense stakeholder drives quality, coverage, and maturity use cases, including the completeness of architecture, modeling, and data quality work and the availability of metrics and key risk indicators around that work.
Cloud Migrator
The Cloud Migrator is tasked with achieving the migration of data and processing to the cloud. Large data migrations require large amounts of data management, including DG; the current trend however is for cloud migration projects to be net producers, rather than net consumers, of metadata including models, taxonomies, data ownership, and data quality metrics. The Cloud Migrator often finds that in order to achieve the required migration, they need to do a lot of data management work themselves rather than finding it ready-made, because the signing-off of a migration project is often the point at which staff start looking at the actual shape, quality, and location of the data! As a result, cloud migration projects have a rich interaction with DG programs and often wind up funding new DG work to ensure that their own use cases are met.
The Cloud Migrator drives a wide range of use cases, especially around documenting the physical state, service levels, quality, and semantics of data. The Cloud Migrator often has considerable input into DG tooling; this is partly because they are often the first to need such tooling, but partly because the Cloud Migrator is the first to take advantage of new tooling options offered by their chosen cloud platform.
Trends in Data Governance
The increasingly effective articulation of use cases (the “why”) has helped drive a number of trends (the “how”) in actual data governance implementation in recent years. Some key examples include:
Hybrid Solutions
At a certain point in the DG journey, many organizations believed that single vendor products could act as a turnkey solution to “solve” the challenge of DG for them. This belief has eroded, and there is an increasing recognition that a diverse set of stakeholders and use cases may require multiple tools. Vendor lock-in and the emergence of powerful best-of-breed niche tools are other factors leading financial institutions to propose a mix of vendors, or a mix of vendor and in-house design, in their DM solutions.
Agile Metadata
Metadata is complex, and the approaches to gathering, storing, and using it are not particularly mature. Maturing use cases mean ever more change in the metadata model. As a result, the metadata management system (MDMS), often seen as the core of a global DG strategy, quickly becomes unwieldy if it relies on detailed yet inflexible relational models. The trend is toward agile, opportunistic gathering of whatever metadata is available, perhaps using document rather than relational formats. It remains to be seen whether this trend will lead to the same problems that “schema-on-read” approaches to the data lake suffered from.
Multiple Stakeholder Groups
It is becoming accepted that DG has multiple stakeholder groups with very different perspectives and priorities; the assumption that some organizations tended to make, that the main stakeholder is regulatory compliance, now seems shaky. The result is to see DG as a central (or at any rate standardized) function that has an unprecedented number of stakeholders and sources of change—and a general acceptance that stakeholders will need to be prioritized to ensure that critical objectives are met.
Data Mesh
If DG is seen as a function based around a handful of very important artifacts (such as the data catalog), it’s clear that these artifacts have a tremendous number of change drivers, and are also single points of failure for the whole DM initiative (where “failure” may mean simple delivery failure, vendor lock-in, or many other failure modes). As a result, organizations are increasingly looking at approaches such as the data mesh, which produces logically unified DG while allowing for physically diverse metadata stores and formats. This approach is logical and healthy but requires strong leadership and a clear vision of what the data mesh is and how business will interact with it.
Automation
A critically important trend in DG as a whole has been the search for automation. It is generally recognized that lineage metadata in particular is extremely labor-intensive to gather manually, and very difficult to keep up to date manually. The market is now rich in automation tools for metadata management and lineage extraction. Yet none of these tools are turnkey solutions, and making automation work for a particular set of use cases in a particular data estate requires understanding and skill.
Industry Initiatives
The financial services technology community is not slow to offer solutions when new needs surface, and new solutions are emerging from various types of organizations:
- Vendors are offering increasingly powerful and diverse data catalog, MDMS, and automated lineage systems.
- Thought leaders are beginning to define reference models and reference requirements for DG and data cataloging.
- Industry groups such as the EDM Council are developing and applying new methodologies to standardize data management and DG.
The Next 5 Years
DG has come a long way over the last decade. From a function that was often hard to cost-justify and indeed hard to define, based on a handful of fairly immature tools, DG has emerged as an important set of capabilities that impact an organization’s bottom line in many ways, improving revenue, reducing costs, and mitigating risk.
However, there are still vast challenges, and compared to other elements of information technology in an enterprise—such as software development or data storage—DG is still rapidly changing and finding its footing. As a result, it is an area that offers significant rewards to enterprises that can most effectively identify their key DG stakeholders, define their needs and perspectives, and match them to a DG delivery strategy.