top of page

3 redaction tips to ensure your document is completely anonymized

Redaction of documents can be a tricky process and indirect personal identifiers can leave your document exposed to redaction slip ups. Follow these 3 simple redaction tips, and never make another redaction error again.

According to GDPR when a document is anonymized it is no longer a part of GDPR. Therefore, you don't need to follow the GDPR-rules that otherwise apply when you handle data. But when do you know, that your document is properly redacted and anonymized?

According to GDPR the definition of anonymized data is when:

"Information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is no longer identifiable"

In other words, when it is no longer possible to identify a person based on available information, the document is anonymized. Be aware that information can be available information inside your organization (not just the document in isolation). Just because you have redacted a document and completely anonymized it is not necessarily anonymized according to the GDPR-definition. For instance, it is not anonymized if you can use other available information to reverse the information and find the original content. This would be the case if you have the original version of the redacted document somewhere else. In this case you have merely pseudonymized the document.

But let's assume that you don't have the original version of the redacted document nor any other data to unlock your redacted content. Consequently, it is not possible to reverse it to the original version. In this case, assuming you have redacted your document properly for personal information (either manually or by using a software redaction tool), your document should be fully anonymized and as such: the GDPR-rules no longer apply for this document: GREAT!

The question that obviously remains unanswered is:

When do I know that my document is fully redacted and anonymized?

You want to be 100% certain that no information in the document can be used to identify a person. Otherwise you risk, unintentionally, failing to comply with GDPR.

To avoid this, be sure you have redacted your document or other sensitive information properly.

Here are 3 redaction situations you should be aware of, that can leave your document exposed to personal information.

1: Direct personal identifiers (names, social security numbers, phone numbers, email addresses)

These are usually the easy ones to find. A redaction tool like Cleardox can with relatively high precision locate the data and with a simple click, you have redacted the information. If you don't have a redaction tool you need to redact this information manually.

When you redact the information it doesn't matter whether you black line it or replace the information with something else (for instance person 1, email 1 etc.). A question I often get in this regard is:

If I replace the information with something else, is that the same as pseudonymization?

The consequence would be that the information is still regulated by GDPR.

The quick answer is NO, as an earlier post showed (anonymization vs pseudonymization).

Many organizations, especially law firms, have a need to replace the information (as opposed to black lining) in order to maintain the meaning inside the document. This is because law firms normally keep and share case documents with other lawyers in a knowledge library for future use - as a part of their knowledge maintenance strategy.

2: Indirect personal identifiers (license plate, titles, company names, addresses, property numbers).

These data points can be more tricky and require a greater sense of legal judgement when redacting the document. This information is, according to GDPR, viewed as personal if you can use the information alone or in combination with other data to identify a person.

A license plate is almost always considered personal information because you can look up the number in a public registry and see who the car owner is.

On the other hand, company information is on a stand alone basis not considered personal information according to GDPR. However, you can very often use company information like a company name together with other information in the document to identify a person. Let's illustrate this with an example.

Redaction example 1:

Example: Company name + title: “The CEO from the software company, Legalsoft, said he would retire soon”.

In this case, the reader can with a simple google-search look up who the CEO of Legalsoft is, and the document is not anonymized properly.

In this case, you can choose to redact either the title (CEO), the company name (Legalsoft) or both depending on how much information you want to disclose.

However, if the example was stated like this:

“A partner from the global law firm, DLA-Piper, said he would retire soon”

It probably does not need to be redacted. With so many partners in DLA-Piper it would be impossible to find out what partner the author was referring to.

This redaction example is a very common redaction use-case that requires a level of judgement.

Whether you should redact the information or not is up to the caseworker. This is where you get little help from a software redaction tool. Normally, the smaller the company, the higher the chances the information is identifiable and should be redacted.

Redaction example 2: title + location data:

“The CEO from a large pharmaceutical company located in the city of Charlotte in North Carolina”

This is another example of how information in isolation is not identifiable but in combination with other information it is. In this case, if there is only one pharmaceutical company in Charlotte, you can easily figure out who the CEO is. However, if you have a dozen or so pharmaceutical companies in the city it probably isn't.

Again judgement is required to assess whether the information should be redacted or not.

3: Fact based personal identifiers: News, media attention and special events.

Does your case attract media attention? Then it could fall into this category and you should be extra careful when redacting. These are often classified as high risk cases.

I have created this third category which relates more generally to the context and the story in the case - which can be used to identify a person. You might argue that this is also an indirect personal identifier and should be placed in the category above. However, whereas that category relates to specific data points such as company name, location etc, this category relates to the story itself. And the story as a whole can be a personal identifier.

This is best illustrated with two examples:

Redaction example 3 (fact based personal identifiers):

“Person 1 was sentenced to 10 years in prison for being responsible for the largest bank robbery in the country's history”

Even though Person 1 has been redacted, the person has clearly been involved in a case which has the media's attention. As such, it should be fairly easy to find out who the person is. It is therefore not enough to redact the person's name in order to fully anonymize the document.

Think about whether your case has received media attention at some point - or has the potential to do so. Some cases are more obvious than others. But it doesn´t need to be media attention. It could also be that the story is so special, that it's fairly easy to find out who the person is. This brings us to the last redaction example.

Redaction example 4 (fact based personal identifiers):

“A refugee by the name person 1 swam across the ocean on a pink flamingo beach toy and was very determined to get asylum”

There might not be a media cover story (yet). But the story is so odd that it's easy to find out who has done it.

The fact in the story is said to be personally identifiable.

Be aware of the story in your case. It could be a personal identifier.


Fact based identifiers and indirect personal identifiers are the most difficult redaction types. It is no surprise that it is also where most redaction slip ups happen (in contrast to the Manafort case which was a technical redaction error)

An intelligent software redaction tool like Cleardox can help you identify and redact the direct and indirect identifiers fairly easily. However, fact based identifiers require human judgement since this is alway different from case to case. As such, it would be almost impossible for a redaction software tool to identify these special cases.

That is also the reason why it is nearly impossible to develop a fully automated redaction tool that does not require any human intervention. However, a redaction tool can greatly improve the redaction process by assisting the case worker.

If you are interested in seeing how a redaction tool works sign up for a free demo now.


The Cleardox team

208 visninger0 kommentarer

Seneste blogindlæg

Se alle


bottom of page