Fair data use and a transparent communication around it are crucial in order to maintain the trust of customers. It is not sufficient to merely comply with the existing legal framework, but rather it is necessary to communicate to the end users how their data is being used in a way they can understand. What to do with data and how to communicate its application is a topic for business, marketing and communication.
One of the most discussed topics around the use of data is data anonymisation. Often perceived as sole competence of the legal experts, it has instead a relevant impact on business decisions, and in this post I will explain why. Also, I will discuss what problem it poses and what simple guidelines can be used to orient daily business decisions.
What does data anonymisation mean?
Data anonymisation is a technique of which the purpose is to reduce the risk connected to data processing (i.e. activities such as use or storage). Anonymised data are not personal data and thus fall outside the scope of the General Data Protection Regulation (GDPR).
By anonymising data, the processor irreversibly eliminates any personal identifier from the data. This makes it impossible to trace back the data subject from the data record. Data anonymisation should not be confused with a similar technique called data pseudonymisation. The latter entails the substitution of identifying fields within a data record with one or more artificial identifiers, or pseudonyms. A pseudonym can replace a single field or a collection of multiple ones. So, for example the user ‘Sebastian Hendrik’ can become user ‘173’. In the case of pseudonymisation it is still possible to re-identify the data subject.
The difference between anonymisation and pseudonymisation is hence that the first technique does not allow any form of re-identification of the data subject. For this reason, pseudonymisation is subject to the limits of GDPR, whereas anonymisation is not.
Why is data anonymisation relevant for your company?
The different legal treatment of data anonymisation and data pseudonymisation has a radical impact.
To mention a case we experienced at Bittiq, imagine the following: a customer’s transaction data are used to feed the algorithm that recognises and categorises banking transactions. In case the user revokes consent and data cannot be anonymised, the data would have to be fully discarded. This compromises the recognition system that was based on that data set.
But if the recognition system is fed by data that can be anonymised, deleting personal identification elements does not affect the precision of the recognition system.
Consider the following:
In the case of the graph, it will be sufficient to remove any part that can be traced back to the user (set 1) to be compliant with the regulation. By keeping set 2, we can make sure that the recognition system is not compromised.
The same topic becomes relevant in a number of other circumstances: think of a presentation where you need to show aggregated data or if you have to create an online campaign that targets a specific kind of users. What happens once you need to remove the data that refer to a part of the audience?
Knowing the different implication of anonymisation and pseudonymisation results into saving a lot of time and costs to your organisation.
The challenges of data anonymisation
The question when a data set is anonymised is harder to answer than it may sound. Legally, data is considered anonymised when it can no longer be used to identify a natural person by using ‘all the means likely reasonably to be used’ by either the controller or a third party. What does it mean in practice?
Think of the following data entry:
This data set contains many personal identifiers such as name, License plate and address. Stripping out these obvious personal identifiers, the remaining data entry would contain:
The remaining data seem generic, but let’s consider whether they are sufficiently un-identifiable to be fully anonymous. A first consideration could be: how many people live on the Frejgatan in Stockholm? And how many of those people have a red Volkswagen Polo? If the answer is 1 than the data can be directly traced back to Sebastian. If the answer is 100, this is already much harder. This raises the question where the threshold lies after which we can consider the dataset fully anonymised.
Let’s now assume that there is indeed only one person on the Frejgatan with a red Volkswagen polo. To compensate, we will give the car type entry the pseudonym “red car”. This is probably enough to make the owner of the car untraceable in the busy Frejgatan street, in the centre of Stockholm. But what if the data subject would live in the countryside, on a road where there are only three car owners?
The example shows that there is no single objective criterion to determine where to draw the line for sufficient data anonymisation, which distinguishes the technique from pseudonymisation. A famous study from 2001 proves that actually 87% of Americans can be identified by only 3 pieces of information. Although these considerations are remarkable from a speculative point of view, business practice demands certainty. So where do we draw the line?
Best principles to be used in the daily practice
Unfortunately there is no definite interpretation of the regulation. Additionally, the GDPR is quite recent and there is no case law from the European Court of Justice yet that can offer a level of clarity. For this reason I believe that companies should attain the best practices as specified by authoritative bodies, especially those handling potentially sensitive data.
At Bittiq, for example, we have defined a series of guidelines based on the Code of Practice of the UK Information Commissioner’s Office. Our guidelines are straight-forward and offer guidance in the daily business practice. Our guidelines apply to an activity mostly based on the analysis of banking transactions.
The purpose of our guidelines is to eliminate a large part of the uncertainty that can affect the daily decisions we take. Hence, making sure our data entries are anonymised is beneficial for both our users and ourselves. Equally important, we always aim to maintain a transparent communication with our customers and partners so that they can rely on us to keep their data safe.
Knowing what data anonymisation means is relevant for anyone who is in a position to handle data in business. Especially companies that operate in finance or digital technology need to be the frontrunners of fair data handling and transparent communication with their customers.
Having a good understanding of data anonymisation and its business implications enables people in non-legal positions to save a lot of time and costs for their organisation while exploiting the full potential of data.
Then there is one more thing. Usually, legal officers are not the people from a company that communicate with the end users. That’s why a good understanding of the new data regulation from those who have a closer contact with the end user is fundamental to create a great and transparent communication on why and how companies use data. Doing this will win companies the trust of their customers.