Bill Wilson draws on NTT DATA’s primary research, published reports and 20 years’ working in central government to distil the critical success factors for sharing data across departments.
In the aftermath of the Civil Service Data Survey, many departments have reached out to NTT DATA wanting a greater understanding of the challenges of data sharing in government. One part of the answer is to read the report – and if necessary, the data can be downloaded in WebCSV format for further analysis.
However, now the dust has settled and we’ve taken stock of the feedback from the report, I’d like to offer more detail on the challenges of data sharing across government. In parallel, NTT DATA are sponsoring the Civil Service Data Challenge, where Civil Servants have been encouraged to suggest their innovative data ideas as part of a ‘Dragon’s Den’ style contest. We’ve been overwhelmed by ideas but almost half the entries were either exclusively about data sharing or rely on it as a prerequisite.
As the National Data Strategy points out, the untapped potential for data-enabled government is huge, if only we can share data; the 2017 Digital Strategy said something similar. That potential cuts across three types of data use: individual casework; business intelligence around operational effectiveness; and driving policymaking. In our survey, data sharing was the third priority for realising the potential of data (after application of data to policy and operational questions). Civil servants are frustrated: there are plenty of efficiency and counter-fraud reasons to share data – and sometimes moral imperatives – but even sharing within the same department can be fraught.
Below, I propose 13 critical success factors, many of which are inter-related. Those old enough to remember the gameshow Blockbusters will notice an accidental resemblance to the famous board.
Most of these feature in the National Data Strategy, which describes the issues as “well understood”. This brings into question why many civil servants are keen to know more. Moreover, fourteen months previously, the National Audit Office Report, Challenges in using data across government, called for better understanding of the issues. Somewhere we’ve missed a step.
In addition to the NAO report, a number of insightful reports exist on gov.uk, including:
- Mid-point review of Digital Economy Act’s powers to enable data sharing, Feb 2020
- Motivations for and barriers to data sharing, DCMS report, April 2020
- Centre for Data Ethics and Innovation’s report Addressing trust in public sector data use, July 2020
These reports were all, in large part, written prior to COVID-19. I saw for myself that the pandemic broke down some of the natural reticence around data sharing within government. One of the early successes was correlating government datasets to identify vulnerable people who might be isolated at the start of lockdown. At the same time, however, COVID-19 and Brexit diverted staff away from more focused work on these challenges, putting us behind. As we recover from COVID-19, there are lessons to be learned. Now is a good opportunity to take stock with the help of these reports, NTT DATA’s primary research and my own experience working in government.
Let’s look briefly at each in turn.
The skills to make data sharing work are expensive and teams typically lack capacity. Funding becomes a particular challenge given the benefits imbalance between publishers of data and consumers. The research tells us data projects can be de-prioritised, creating a start-stop effect. Also, ‘plumbing’ projects often struggle to make their business case. Solutions include cross-charging or a centralised funding model, making business cases around not collecting the same data twice and bottom-line savings through more effective counter-fraud. Departments are encouraged to include data governance costs in spending targets and business cases.
Policy & legal gateways
This was the third priority for central action in our survey. New legislation is slow and the Digital Economy Act has had some success, but poor understanding and risk aversion are persistent obstacles to progress. New Objectives (under the 7 powers) take time to establish. Moreover, legal understanding needs to match on both sides of a data share; the impact of GDPR means it’s sometimes assumed that consent is needed when public task is the correct legitimate basis for processing personal data.
Below is my own analysis of the Digital Economy Act data sharing agreements, extracted from the open data here. This analysis excludes four multi-party agreements which can’t be easily illustrated using the grid and also data sharing for external research purposes (which has been comparatively successful). The colour coding relates to the reason for the share, obtained by reading the descriptions rather than referring to the Power used. We can see that data sharing agreements are dominated by DWP’s sharing with energy companies as part of fuel poverty measures and also HMRC’s sharing of PAYE data with local authorities to pursue Council Tax debtors. Excluding these, the net result is 26 data sharing agreements in four years. Looking at the transport protocols pie chart, we also have some way to go with modernisation.
Suggested mitigations include centralised legal resources, better education about current legislation and policy, and making a distinction between limited time pilots and operational platforms.
Standards and master sources
In our survey, this was the fourth priority for central action. Of course, having a standard is not usually the problem, it’s the number of competing standards. The National Audit Office report found more than twenty ways to identify and link individuals across government. The difficulty in linking people cuts across policy questions for front-line services. The Data Standards Authority is starting to make inroads here. The potential is huge considering the rapid growth in industry that follows the adoption of common standards (see Paul Wilson’s excellent blog on ‘ecosystemtic thinking’).
The asymmetric benefits of data sharing can be exacerbated by inter-departmental mistrust and lack of agreed standards around security policy. I have no easy answers here except departmental and leadership objectives and the combination of other measures in this blog.
Of the areas addressed by our Civil Service Data Survey, governance was comparatively the most mature, although it still lacks skills and investment. Also, it’s often dropped when there are other priorities; not every department has a data governance board, for example. We hope the new Central Digital and Data Office will redress the balance between the emphasis on digital alone vs data and data governance, which are crucial to making the digital transformation of government lasting and valuable.
Information sharing agreements
The point here is about re-use so each information sharing agreement does not start from scratch. Progress here is slow – although Microsoft published some data sharing agreement examples as part of its Open Data Campaign.
Leadership and culture
This was the highest priority area to address from our survey. Change needs to come from the top and we need the right priorities for new projects. I wrote more about this in a previous blog. We need a sharing by default culture – easy to write, hard to achieve. This can be supported by a champions network (some of this exists) and we are starting to see change at ministerial and prime ministerial level as Prime Minister Boris Johnson himself commits to ‘following the data’.
Perhaps surprisingly, technology isn’t typically the greatest issue. Examples of the knottier problems include: incompatible security protocols; security concerns about the cloud; and the immature use and understanding of de-identification technologies. Some of the latter have been explored in government (e.g. the ONS) but best practice on implementing them is hard to come by. In a recent talk, John Mallinder from Microsoft argued that technology capabilities exist to easily share data (e.g. Azure AD B2B and Azure Data Share) – even across cloud providers (e.g. support for SAS tokens). We were both part of a round table discussion on government data sharing and my slide on the capabilities of a cross-government data sharing platform was supported by John’s Harmonised Mesh architecture – both conceived of a centralised data catalogue but federated data access.
It’s likely we’ll see this architecture as part of the Data Strategy’s proposed Integrated Data Platform. This is partly for practical reasons but partly for what The Centre for Data Ethics and Innovation might call ‘necessary friction’ in data sharing to limit the powers of the state, account for different contexts of data collection, and minimise the blast radius for data breaches. John could also have mentioned the work being done in the Microsoft AI Labs on topics such as Differential Privacy and Homomorphic Encryption.
Knowledge of the data landscape
Data cannot be shared if people are unaware of its existence. Our survey showed this was one of the areas with greater data maturity – almost 70% of respondents said their department had a firm plan being delivered. However, catalogues can be missing, not standardised, discoverable or maintained. It’s no wonder we had around 20 ideas suggesting something better as part of the Civil Service Data Challenge. The National Data Strategy contains a commitment to cataloguing – especially key datasets – but keeping these catalogues up to date (ideally automatically with data ‘crawlers’) is crucial. We also need not just data standards but metadata standards.
Poor data quality is tolerated but the extent of the problem is underestimated – until a scandal like Windrush (where the Home Office took decisive action on what turned out to be poor quality data). Poor data quality can arise because secondary purposes for data are either not understood or not considered when data is collected. This creates an imbalance similar to the funding issue: data quality is not a free good, but the publisher must invest to create a level of quality that is not always necessary for the primary purpose.
Data quality is a challenge for many of our customers. Our advice includes: measuring the problem and the impact of remediation (bearing in mind that re-measurement is needed as quality degrades over time); determining and addressing root causes (gradually eating the elephant); and using data sharing as a means to help solve the problem through correlating data sources. Note that occasionally cleaning data can be counter-productive because of usefulness of raw data in detecting fraud. Acquiring uncleansed data is therefore an important part of what we do in our Intelligent Case Management Centre of Excellence.
This was the top priority for central action from our survey of the Civil Service. In our work outside government, we find it’s not just a question of training a handful of specialists – data literacy cuts across the organisation. NTT DATA’s Vicki Chauhan was in conversation with Joanna Davinson at Civil Service Live recently and this was one of the themes Joanna talked about.
As the sixth priority for central action, public trust is a large part of driving a risk-averse culture. A mistake in one area of data sharing sets everyone back – breaches affect data controllers and data processors. Interestingly, there’s a difference between trust in competency and trust in intentions, but both are low (40-60% of citizens do not believe the government uses data in their interest). In mitigation, the Centre for Data Ethics and Innovation suggests a new social contract around data. This would complement the more general social contract between the citizen and the state, first described in Rousseau’s 1762 book of the same name. Greater transparency is often suggested as an answer here – for example, not only publishing established data sharing agreements (see above) but also those that were rejected. As within government, data literacy is important for citizens to make informed choices. NTT DATA not only provides data literacy programmes to our clients, but we are also involved in supporting wider public data literacy (look out for more to come).
I started writing and speaking about Data Ethics in 2018 but in light of the huge amount of research that took off through 2019 (not least the founding of the government’s Centre for Data Ethics and Innovation), others were making the running more effectively. From the research, two issues are relevant here. The first is linking datasets about individuals who may not have provided the data if they had known it would be linked in this way. The second is about how to represent the interests of data subjects in ethical discussions. As to the first, data wallets (the UK government has used the arguably overloaded term Smart Data) can help where the data subject has the necessary data literacy. The ODI’s work on data trusts has focused on representing data subjects. Its Data Ethics Canvas provides a simple but comprehensive starting point for any project with potential ethical issues.
Where to start
I’ve highlighted the areas where action is most needed, starting with Leadership and Culture (which is discussed further in my blog on the National Data Strategy).
We know from helping a wide variety of organisations with their data strategies that data standards, governance and data knowledge are all foundational. In the case of UK government, though, ‘P’ for Public trust will be crucial for bringing citizens on that journey.