Data governance refers to a set of principles, practices, and processes that a company will put in place to ensure that data is available, usable, accurate, and secure. There’s a lot packed into that one definition, and so many different angles to tackle.
Rather than going into the nitty gritty and trying to boil the ocean, let's focus on: why does this matter? In analytics, it comes down to the fact that data is only valuable when you can trust it - which means you have to be sure that it’s up to date, accurate, and trustworthy. Data governance is what you put in place to ensure this trustworthiness. Only then can you confidently analyze it, report on it, and make data-driven decisions based around it.
As with all large-scale, important goals like this, this is a long-term game that should ideally be prioritized from the start - it involves getting the right people and technology on board to begin, in order to create a scalable data foundation. In our view, in order to be successful, data governance should be treated as a program, with a framework or playbook that is communicated to all relevant stakeholders internally, and agreed to by all.
In a world where self-service analytics has been introduced, allowing greater data access and adoption across the entire company, data governance has never been more important. While there are countless things you can do to ensure high governance, here are some of our top best practices:
Data governance is all about controlling who can do what around your raw, granular data. As a first step, divide your employees into 4 groups:
This way, you’ll always have control over who is responsible for what actions around data, and who has the authority to do what.
With clear boundaries in place around the roles of each group, you’ll be able to establish clear lines of governance so that data quality and accuracy can be ensured through the controlling of the permissions of each group. The boundaries between each team and their responsibilities are clearly defined, so you won’t have data consumers trying to model or worry about what’s in the warehouse. These boundaries will put you in a good place to avoid messy or broken data in the future - and save time going from data to decisions. This also helps establish clarity when troubleshooting issues with data.
Your core data team should put a process in place that certifies each member of the groups above by access type, requiring that they ‘master’ their specific role and responsibility, and subsequently own it. For example, if you grant someone the capability of creating a dashboard, then this person must be certified for that role internally, with a validated training session.
This reinforces the notion that they are the ‘owners’ of this area, encouraging that they stay up to date with the latest changes in their respective area. Set up newsletters and quizzes to communicate any changes or updates, to ensure that the different groups stay on top of it. This can even be a part of a larger Data Literacy program at one point.
When a question or an issue pops up, what do you do? If there's no process in place, people are going to start going rogue. They'll likely keep working on the issue themselves, or ping their managers or their team mates, which means the information isn't getting escalated quickly enough to everyone who needs to be informed.
Establish a specific channel as a means of communicating issues, and a formal process for resolution. This should be a standardized process that everyone sticks to every time, so there’s never any doubt around informing or escalating an issue, and what happens next.
Delegate ownership as much as you can across the 4 groups, to empower everyone down the line to escalate questions and issues according to what they're seeing in their area. Turn them into "champions" of their own area, and allow them to investigate and dig into the data if it's within their field. The question / issue should be escalated through the same hierarchy as the one defined in point #1. Remember, with data, any delays in escalation or resolution means lack of reliability around data, which could hamper company decision-making and growth.
Make sure you truly understand and can articulate your objectives. Why are you starting this data project, and what are you looking to achieve from it? This is a "no-brainer" one that should be thought through regardless, but it's also crucial for data governance, since it shows transparency and gets people on board.
Are you trying to streamline operations? Or is this about revenue recognition? Is this about customer happiness, or sales? Identify KPIs that will effectively measure your project’s success. Then, be sure to communicate outcomes to all stakeholders periodically to show that things are moving in the right direction, and that it’s worth the effort. When stakeholders are on board on why this data project is happening, and how it’s positive for the company, everyone will be more invested in helping to make it work.
When evaluating data vendors, ask about their governance features, and what they do to ensure high governance of your data. Do they value governance enough to build it in into the tool/platform at its core?
Whaly, for example, ensures governance by having 2 bespoke tools: the Workbench for data teams, and the Exploration layer for business teams (data consumers). These tools are highly effective at communicating with one another. With two different tools that are tailored to the responsibilities and preferences of each team, business teams can play with the data in a safe way, in their own environment that’s more suited to non-technical users. Data teams can rest assured that the data that they're modeling in the Workbench won’t be impacted by the Explorations that the business teams perform.
Some tools have a proprietary language that must be learned to perform certain actions within the tool, which makes it inherently inaccessible to some groups of users. This is one form of governance, although it also requires a steep learning curve even for your data experts. In today’s environment in which self-service is gaining traction, you have to be careful about blurry access lines. There should still be clear-cut definitions around who can do what with data within the tools you use every day.
As an added plus, if the tool integrates with dbt, then you’ll be able to test the data in advance and always know when it’s stale. This kind of visibility is also key to governance.
In summary, make sure you start with a scalable foundation for data governance, so that even as your data volume skyrockets or your teams triple in size, you’ll always have a solid framework and process around ensuring data quality and accuracy. Once you’re at risk of unreliable data, you may be steering your company in the wrong direction. Although getting this in place from the beginning is recommended, it’s never too late to start!
If you'd like to learn more about governance from our data experts, or understand how Whaly ensures high governance, get in touch!