Cookies
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

December 4, 2024
This hidden data is often referred to as metadata and it is important to be aware of, since it can contain personal information that should be removed when we redact our document.
But what is metadata? In simple terms metadata can be defined as: Data that provide information about other data. In other words, metadata is a simple shorthand version of the data to which they refer to - it describes what other data really is. Think about metadata as the keywords you enter into Google when you do a search. The keywords you put in are the metadata. Another way to think about metadata is: data about data.
Try out a free preview of our Cleardox Intelligent and Secure Redaction Software Tool here
Metadata comes in many different forms and categories, and is normally categorized into the following categories:
Descriptive metadata is descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author and keywords. Descriptive metadata is in most cases the category you need to be aware of, when you redact documents, because this is the category that you often find personal information.
In the example below you can see a list of metadata from a document.
As you can see there are various examples of metadata that directly or indirectly could identify a person:
Well first of all: a professional redaction software tool like Cleardox will do the trick. Most software redaction tools remove metadata as part of the redaction process. However, it is not a guarantee. By the way, if you are looking for a redaction software tool we recommend that you read this post on the ten recommended features any redaction software tool should have.
If you don't have a redaction software tool yet and are forced to manually redact documents, another (more old fashioned) way to make sure that your metadata has been removed is by: Print the document and scan it! In this way you ensure that a new document is being created with the original metadata removed. This redaction process, however, takes time (and is not good for the environment either).
An alternative way is to remove the metadata with a metadata removal tool which can be found with a simple Google search. But Microsoft Word also has a feature that allows you to remove metadata. As you can see in the figure below, you can access the metadata in word by clicking on File followed by info. This will take you to a screen that shows much of the same metadata as in the figure above.
As you can see, there is a link near the button (Show all properties). This will provide you with a complete list of all the metadata in the document. If you want to remove the metadata simply click on the Inspect Document button. This will take you to a new screen, where you have the option to remove all the meta data.
When former Danish Prime Minister, Anders Fogh Rasmussen, gave his annual new year's speech to the public, his speech was shared with the public in a word file after the ceremony - a standard procedure.
Unfortunately, the document contained metadata that revealed sensitive information. For instance that the speech had been written by someone else - and someone from another organization outside the Prime Minister's office. In addition, the metadata also showed what corrections the Prime Minister had made in the speech. For instance, it showed that the Prime Minister in one of the last iterations chose to erase a sentence that promised more money to municipalities in Denmark.
Though it is hardly a secret that most Prime Ministers don´t compose their own speeches anymore and even though this case is not as bad as the Manaman case, I´m sure that the Office of the Prime Minister would have wished that they had redacted the document properly and stripped the document for metadata. The Ministry later changed their procedure and now only share the speeches in a PDF-format. So does PDF not contain metadata?
As the story above might falsely indicate: As long as we share documents in a PDF-format there is no metadata. But as we learned earlier, that is only the case if you print the document and scan it (thereby creating a new PDF-document).
PDF-documents do in fact also contain descriptive metadata such as the author’s name, keywords, and copyright information, that can contain personal or classified information.
Thus, this should also be considered removed when you redact a document.
Select the “Preview” button to view the hidden text. Select the “Show Preview” button at the bottom of the dialogue box. Select “Show Hidden Text” from the preview of the document. You can scroll through the pages of your PDF using the double arrow buttons on the gray Acrobat navigation bar.
Metadata is hidden information in documents that can be used to directly or indirectly identify a person. Therefore you should be careful to check your metadata and remove it when you redact a document prior to a release to third parties. Metadata can fortunately be removed fairly easily and most redaction software tools automatically remove it for you.
Try out a free preview of our Cleardox Intelligent and Secure Redaction Software Tool here
Interested in our product? Sign up for a demo here.
Cheers, Team Cleardox