Q We have heard from witnesses today about a lot of the negatives and potential pitfalls of data sharing across Government. I have nothing against the Government’s intentions here, but do you share the concerns of previous witnesses about the lack of safeguards for privacy in part 5 of the Bill?
Professor Sir Charles Bean:
You will have to excuse me; since I was not here for your earlier discussions, I am obviously not aware of what earlier witnesses have said and what their reservations are. My interest obviously is in the use of the information for statistical purposes. It is important that there is a clear and well understood framework that governs that, and there clearly need to be limitations around it.
I have to say that I think the current version of the Bill strikes a reasonably sensible balance, but there are bits that will clearly need to be filled in. The Office for National Statistics will need to spell out a set of principles that govern the way it will access administrative data, and so forth.
You said you are satisfied that it strikes the right balance. Do you believe there is any framework in terms of the principles for data sharing in part 5?
May I come in and build on this? Privacy is absolutely critical to maintaining public trust, and in a sense we think the Bill has missed a trick here. On the research side, the framework is embedded on the face of the Bill. In our view, the ONS has a very good track record—it has maintained 200 years of census data, it has the best transparency, it publishes all the usage of the data and it has already criminalised the proceedings of misuse of data—but that has not been put on the face of the Bill. A tremendous amount could be done to reassure by taking what is already good practice and putting it on the face of the Bill, and I think that will answer the issue for the statistics and research purposes.
Q My full question was not, “Do you believe in transparency?” It was going to be: do you believe in transparency in terms of how citizens’ data will be shared with the Government and between Government agencies? That principle, as you say, is not only not on the face of the Bill but not anywhere in the Bill. We have been asked by the Government to rely on codes of practice that have not even been drafted yet.
Q As you say, Mr Shah, for Government data sharing to work requires public trust, and digital government and the use of your statistics absolutely requires trust that the Government will handle data with due purpose and cause.
Another thing is that the UK Statistics Authority is directly accountable to Parliament, not the Government. That actually makes the statistics and research strand more accountable compared with other parts of the Bill. I remind you of that, which is very important.
Q I would be interested if you could explain and put on the record some of the consequences you see of having this Bill and the underlying secondary legislation on the statute book. What impact will that have on the areas in which you are experts?
Professor Sir Charles Bean:
The key thing is that it greatly improves the gateways that enable the Office for National Statistics to use administrative data—tax data and the like—in the construction of official economic statistics. We are well off the pace compared with many other countries. Scandinavian countries, Canada, the Irish and the Dutch make very heavy reliance on administrative data and only use surveys to fill in the gaps. Here, the Office for National Statistics is essentially an organisation that turns the handle, sending out 1.5 million paper forms a year and processing those. Essentially, you are acquiring the same information again that you have already got in some other part of the public sector, where the information is being collected for other purposes.
The key gains here I see as twofold. First, because you access something close to the universe of the sample population rather than just a subset, which would normally be the case with a survey, you potentially get more accurate information. It is potentially also more timely, which for economic policy purposes is important.
The other side of the coin is that by enabling you to cut back on the number of surveys you do, there is a cost gain, which I should say would probably not mainly be a gain to the ONS, because they have to do the processing of the administrative data, but a gain to the businesses and households who are currently spending time filling in forms that they would not need to do if more use was made of administrative data.
I completely agree with Charlie Bean that we are really in danger of being left behind compared with where other countries are on this agenda. The European statistics peer review, which happened last year, said that this was the key weakness in our statistical system. If you look at bodies like New Zealand, Finland and Canada, they all have this ability to access, so we have got to have it. We are spending £500 million on the census and you have got a lot of that data that you could be using through administrative data.
Similarly, on inflation, which is a critical economic indicator, at the moment we send out people with clipboards to take price points of 100,000 items in 140 locations around the country every month, but there is scanner data that tells you the price that people paid. This could really revolutionise. It is not statistics for statistics’ sake; it is to answer the questions that parliamentarians and policy makers have on issues about social mobility and productivity. For all these questions you are asking yourselves, we need the data. And if we are criticising the ONS about not being quick enough, we need to give them the powers to be quicker.
Q In terms of the provisions in the Bill on sharing data for research purposes, could you shed a bit more light on how that will benefit the wider research community? I was also wondering what the immediate priorities will need to be for the UK Statistics Authority as the accrediting body for the infrastructure provided by the research powers in the Bill.
The Bill creates a permissive power and it really streamlines what at the moment is quite a complex legal environment for researchers accessing Government data. This makes it much clearer that if a researcher meets a set of conditions—the research is in the public interest, the researcher is accredited and it will use the research in a safe haven, as it were, and so on—they are able to access that Government data.
We gave some case studies in our evidence of research that is obvious, such as what affects winter mortality and understanding the productivity gap. Those are questions that researchers want to investigate, but they cannot get hold of the data from Government Departments. To be fair to the Government, there is concern from their side about handing over data when the legal framework is not clear enough. I think this process will really streamline that.
One caveat is that it is slightly odd that health data are out of scope. Most of the biggest concerns that researchers have are in trying to build the relationship between survey data and, often, the health outcomes in certain areas. I understand the reasoning behind this: because of care.data there were some concerns. Health is very important. Our view is that the Bill should build in the scope for health data and then allow for future legislation to say how that will be dealt with, in particular once Fiona Caldicott, the national data guardian, has consulted on her framework, which is happening right now.
Professor Sir Charles Bean:
I would endorse a lot of that. I should say that in Canada, where I spent some time talking to Statistics Canada in the course of doing my review, they have exactly this model. There are clearly defined criteria under which researchers can get access, with a sort of prescribed laboratory where they can use it. I think there is something like 30 requests a year to use information, so it is quite heavily used.
Certainly when I was talking to people here during the statistics review, the issue was raised during the consultation process by people such as the Institute for Fiscal Studies, who wanted access to the microdata to be able to study the impact of tax structure on decisions and so forth. The difficulty of getting that microdata inhibited good research. I am sure the demand is there.
Q Several witnesses have expressed various degrees of concern about issues of privacy, whether merited or not. In terms of what is taking place in Canada, have you seen any data leaks or anything that would raise concerns about what we are pursuing?
Professor Sir Charles Bean:
I am certainly not aware of any leaks or anything. They are clearly very concerned about making sure that personal information is not divulged. It is very important that the information made available is not only anonymised but cannot be reverse engineered to find out who the agent concerned might be.
If you are looking at information on companies, there may well be, if you are not very careful, information that might be reverse engineered to find out that the name of the company is probably such and such. It is very important that you have good processes to make sure that the information that is provided to researchers is sufficiently anonymised but, as I say, the Canadian experience suggests that you can do that quite happily.
Q One of the biggest contributing factors for people moving house is having access to a decent broadband signal. Have you done any statistical or economic modelling of population densities and movement away from cities to rural areas? Is that a piece of work that you would be prepared to do to find out the economic benefits to rural areas as part of the USO?
Q You have both talked about other European countries and Canada. Forgive me for not knowing whether this is the correct term, but are we talking here about big data? Is that the term I hear bandied about? Either way, could you tell me a bit more about the benefits and outcomes in terms of policy information? Give us a bit more information about what these other countries are doing better and how their politicians are better equipped as a result.
Professor Sir Charles Bean:
I think most people use the term “administrative data” to refer to large information held within the public sector that accrues as a by-product of whatever the public authority is doing. Tax information is a classic example, and it is something that is obviously potentially of use to the Office for National Statistics in constructing economic statistics. Big data is a wider concept that embraces the vast range of information that is generated by various sorts of private sector organisations, which includes the scanner data that Hetan mentioned. It is the sort of information that is generated by the likes of Google and phone companies. Big data is much broader.
There is a question about the extent to which you can use big data in the construction of official statistics. I think there are two obvious areas that you might want to exploit. One is scanner data for constructive price indices, which Hetan has already mentioned. The other area where I could see private sector big data being of considerable use is on payment information—information from payments processors and payments providers.
Of course, there is a vast amount of other information that is generated by the private sector. Some of that information might be useful for shedding light on new puzzles or new phenomena in the economy. One might want to be a little bit wary about relying on them to build the regular official statistics because you cannot be sure they are always going to be there, whereas you will probably have a reasonable presumption that the payments information and scanner data will continue to be available, and the Office for National Statistics could therefore use them on a regular basis.
I can give a couple of examples or case studies. One is pensions. In this country we have made quite a lot of changes in recent years around pensions policy, but it is very hard to track the impact of that. The Bill will allow for the ONS to bring together the benefits and pensions data, which are held by the DWP, the HMRC data, and also to go out to companies or to either regulatory bodies or federated bodies and get their data and bring those together so that we can see what auto-enrolment has actually meant, in terms of the amount people are putting into their pensions, and you can actually start tracking policy.
Another example is international student migrants, which is clearly a hot topic at the moment. At the moment there are Home Office data in one place, the Higher Education Statistics Agency holding useful data in another place and there are labour market data held in a third place. You could bring all those things together to actually track the impact and the numbers and so on, which at the moment we just do not have a good handle on. Those are the sorts of things that are possible if you give your statistical office access to the aggregate data from other Departments and also some access to private sector data.
Q Mr Shah, you have partly answered my question, so I will turn to Professor Sir Charles Bean first. What kind of Government data would you personally like to get access to; what would you do with it; and how would the public benefit from your having it?
Professor Sir Charles Bean:
First and foremost, I would say the tax data that HMRC holds—value-added tax, income tax and corporation tax. Value-added tax is particularly useful because it tells you something about inputs and outputs of businesses. It is potentially quite good, up-to-date, timely information on activity in the economy. I should say, when I was on the Monetary Policy Committee, we used to get informal briefings each month from the Treasury representative on what they knew about the tax receipts coming in that month, but having more detailed information about what was going on would be potentially very useful. In principle you can envisage building the national income accounts almost entirely on that sort of information if you have access to it, and you can make sure that the income-outcome expenditure sides are all balanced. That, as far as I am concerned, is by far and away the most significant thing.
I think it would be quite useful to bring in another dimension here about why administrative data are useful. There is obviously a lot of interest in regional issues. As it is at the moment, most regional information is collected to align with administrative areas of one sort of another, but those are not always the most natural units to be looking at for studying a phenomenon. If you think of Wales, north Wales is not actually trading with south Wales, it is trading across with Manchester and Liverpool, while south Wales is trading across with Bristol and so forth. If you want to think about the regional economics, you need things that allow you to look at those nexuses, rather than the information you might be given on the Welsh economy. If you have administrative data, with regional, locational identifiers, you can in principle aggregate the information in whatever way is best suited to the particular issue that you want to look at.
In terms of thinking about statistics for the 21st century, we need to be thinking about a framework that is actually quite fluid and flexible, rather than one in which everything is pushed into a set of standard definitions for GDP and stuff like that, and standard regional definitions and so forth. When you have access to the underlying micro information, providing you have appropriate identifiers that you can manipulate and link, you have open to you all sorts of possibilities that we do not currently have.
I have just a couple of examples. One is systemic financial risk. Post 2008, I think there was a recognition that we had focused too much on the risk for individual financial institutions and not looked at risk at a systems level. There is a possibility of doing that. The Prime Minister has indicated an interest in how the labour market is changing with the rise of zero-hours contracts and so on. Using a mixture of administrative and private sector data would allow us to start to get a handle on how the economy is changing.
Q Mr Shah, you keep mentioning access to data, but the problem we heard earlier is that the Bill talks not about access to data but about data sharing, which implies duplication. We should really be moving towards data minimisation. Do you think that the language of the Bill should reflect access to data, rather than data sharing?
Q It discusses the transfer of data. It does not talk about your accessing data. It does not mention the technology through which you would do it. There are no codes of practice alongside how it would happen. It is very broad and explicitly talks about data sharing in certain areas.
I think I said this earlier, but in case I was not clear I shall repeat it. For statistical and research purposes, statisticians and researchers are interested only in aggregates; they are not interested in us as individuals. It is a key point that the relevant clauses are quite different from some of the other parts of the Bill. Others have indicated in their evidence that this area should be seen as slightly different.
It is also worth noting that there are safeguards that have been tried and tested over many years. There is the security surrounding the data—the ONS will not even let me into the vault where they hold the data. You need to be accredited and to sign something saying that you will not misuse the data. If you do, you will go to jail. The trick that has been missed has been not saying all that, because it is almost assumed that that is how the ONS works. My suggestion is that if you want to strengthen that part of the Bill, you should just lay out the safeguards that are already common practice in the ONS.
Q Thank you both for setting out some very factual and helpful arguments as to why the provisions are a good thing, particularly when it comes to aggregate statistics. I was struck by a quote in your report published in March, Professor Sir Charles. You mentioned the
“cumbersome nature of the present legal framework”,
which the Bill will clearly help to solve, and you also said that there was a
“cultural reluctance on the part of some departments and officials to data sharing” and, in many ways, to working together, as we know from experience. How do we solve that problem and get Departments to realise how helpful some of these datasets might be?
Professor Sir Charles Bean:
A key thing about the Bill is that it shifts the onus of presumption. There is a presumption of access unless there is a good reason not to comply or explain, if you like, as opposed to the current arrangement, which is that the data owner has the data and you say, “Can you please let us have a look at it?” There is civil service caution. I was a civil servant very early on in my career, so I am aware of how civil servants think. Inevitably, you are always worried about something going wrong or being misused or whatever. That plays into this, as well.
In the review I said there are really three elements and I think they are mutually reinforcing. There is the current legal framework, which is not as conducive as it could be; there is this innate caution on the part of some civil service Departments, or even perhaps on the part of their Ministers on occasion; and then the ONS has not been as pushy as it might have been. It is partly that if you know it is very difficult to get in—people are not very co-operative at the other end and the legal frameworks are very cumbersome—you are less inclined to put the effort in, and you think, “Oh, well, let’s just use the surveys, as we’ve always done.” So I think you need to act on the three things together, but they are potentially mutually reinforcing if you get the change right.
This is one area where I think the Bill could be strengthened. At the moment, the ONS has the right to request data; similarly, the researchers have the right to request data. The Department can still say, “No”, and in a sense the only comeback is that there is a sort of name-and-shame element of, “Parliament will note this”, as it were. My worry, given the cultural problems that have been seen in the past, is that that may not be enough. So why do we not do what Canada does? It just says, “The ONS requests”, and the Department gives.
Q Professor Bean, in terms of the current legal framework and the problems with it as it exists, am I right in saying that there is an issue with legislation that was passed in the previous Government, under Gordon Brown’s premiership, that caps the use of data and research material, and which needs to be addressed quite urgently?
Professor Sir Charles Bean:
Yes, I think it does need to be addressed. The existing Act was introduced with the intention of trying to improve the ability to share data, but it just has not operated in the way that people maybe hoped it would. In practice, having talked to the ONS and other Departments, it sounds like an extremely cumbersome process. So I think this is a case where the original legislation may have been well intentioned, but—