Frequently Asked Questions
We’ve gathered the most frequently asked questions about Cleardox so you can quickly get an overview of features, security, integrations, and pricing. If you can’t find what you’re looking for, you’re always welcome to contact us.

Cleardox automatically detects and helps remove personal and sensitive information in documents using a combination of machine learning and rule-based detection. The tool is assisted and always requires user review before finalizing results.

Cleardox can detect sensitive and personal data across 20+ categories, including names, email addresses, phone numbers, addresses, SSN numbers, company information, dates, account numbers, authorization IDs, and geographic data such as regions and municipalities.
The tool also identifies additional sensitive content such as financial data, license plates, titles, and confidential information across documents. Cleardox can also identify indirect personal data such as job title and organization, which in combination can be used to identify a person.

Within our field, performance is typically measured using two metrics:
- Recall - how well we identify the information that needs to be found
- Precision - how accurately we identify it
Depending on the quality of your data, you can typically expect:
- Recall: around 97%
- Precision: around 90%
The reason precision is often slightly lower than recall is that we prefer to identify slightly too much rather than too little. In other words, we prioritize ensuring that sensitive information is not missed - even if it occasionally leads to slight over-anonymization.

Cleardox supports all languages, although accuracy may vary depending on language complexity and available training data.

Yes. Using Optical Character Recognition (OCR), Cleardox can convert non-searchable PDFs (so-called flat PDFs) into machine-readable documents.
The process happens automatically, so the user does not need to take any action. Cleardox intelligently detects whether a page already contains machine-readable text. If no text is detected, the page is automatically processed with OCR, ensuring the document becomes readable and ready for further processing.

Yes, we support more than 50 different file formats. When uploading documents, you can choose whether they should be converted to PDF. This is the most common setup and is called Cleardox Convert. When Cleardox Convert is enabled, we automatically convert different file formats into PDF upon upload, ensuring a consistent format across all documents in the system.

When a document is uploaded (regardless of format), Cleardox automatically creates a new PDF version during processing.
This ensures that redactions are permanent (non-reversible) and that full control over metadata and sensitive content is maintained.

Yes – this is what we call native support. The term may sound technical, but it simply means that you can download the document in the same format as the one you uploaded to Cleardox. Our native support includes Word files (.docx) and Excel files.

No, there are no limits on the number of documents you can upload to Cleardox, and there is no page limit for individual documents. It is not uncommon for files with 10,000+ pages to be uploaded to the system. However, if you upload a document with 20,000 pages, you might want to grab an extra cup of coffee while it processes.

Users can share cases internally with colleagues as either viewers or editors. External sharing can be enabled with additional security controls.

At Cleardox, we offer support directly within the system - and we always prioritize responding to inquiries as quickly as possible (we’re pretty quick to respond, if we do say so ourselves 😉).
Our support is available during normal business hours (09:00–17:00 CET), and support requests are categorized and prioritized based on their level of criticality.

Support access is strictly limited to authorized personnel only. Documents are transferred securely, and support staff have read-only access with no ability to modify data.

Yes. Cleardox provides an API that allows integration with external systems. Through the API, you can upload documents, retrieve anonymized/redacted documents, and return processed files back into your systems. This makes it easy to integrate Cleardox into existing workflows.

No. Customer data is not used for training or continuous model retraining unless explicitly agreed in writing. Data is only used for processing and anonymization/redaction of documents.

Yes. As a user, you can create your own categories with custom search criteria.
As an administrator, you can also create and maintain categories that are shared across your organization, so individual users do not need to recreate the same categories.
If a category is more complex and cannot be solved with simple search terms, our technical team can also help create it for you.
However, most customers find that the existing categories already cover their needs.

At Cleardox, we work to make document redaction simpler, safer, and less time-consuming. Our goal is to reduce complexity and workload, so you can end your workday with peace of mind - knowing that your documents have been anonymized correctly.
Cleardox is built with a strong focus on usability, security, and collaboration, making it easier to work with even large document volumes.
What our customers especially highlight:
- Cleardox is intuitive and easy to use
- It is easy to collaborate with colleagues in the system
- You can pseudonymize content – not just redact it
- As a lawyer, you have access to all relevant features in one unified PDF solution
- We provide support directly within the system

Yes, Cleardox also pseudonymizes content.
This means that sensitive information can be replaced with consistent pseudonyms (e.g. P1, P2, V1, etc.). This ensures that the structure and context of the document are preserved.
You can choose between black redaction and pseudonymization, and this can be configured per category in the system.

Pseudonymization removes sensitive information and replaces it with pseudonyms (e.g. P1, V1, Address 1, etc.), allowing documents to retain their context and become easier to read and work with. This is especially relevant when organizations want to reuse data containing valuable knowledge in internal processes such as knowledge bases or AI-related use cases.
Black redaction removes sensitive information and replaces it with black boxes.
Both methods ensure that sensitive information is removed and is no longer available in the document after anonymization.

Yes. Cleardox ensures consistent pseudonymization both across documents and across users within your organization.
This means that everyone works with the same pseudonymization logic, ensuring that the same type of information is always pseudonymized in the same way – regardless of who is working in Cleardox. You therefore avoid manually maintaining pseudonymization lists, where you would otherwise have to track that, for example, “Oliver” always equals P1 (Person 1) and “Louise” equals P2 (Person 2).
Cleardox provides a set of predefined pseudonyms, which you can customize and adjust as needed. Examples:
- Persons: P1, P2, etc.
- Companies: C1, C2, etc.
- Emails: E1, E2, etc.
- Social security numbers: SSN 1, SSN 2, etc.
- Addresses: A1, A2, etc.
- Phone numbers: Tel. 1, Tel. 2, etc.
The result is a more consistent, efficient, and secure pseudonymization process.

Cleardox uses a combination of rule-based detection and machine learning-based Named Entity Recognition. These methods can be used separately or together to maximize coverage.

No. Cleardox uses machine learning strictly for classification and extraction, not for generating text. This improves predictability and security.

No. Since Cleardox does not accept prompts or generate text, it is not exposed to prompt injection or jailbreak-style attacks.

Cleardox uses transformer-based models for named entity recognition that are trained on openly licensed and proprietary Cleardox datasets.

Accuracy depends on category and document quality. Cleardox is optimized for high recall to minimize the risk of missing sensitive information.

CPR numbers are detected using rule-based patterns, including validation of date structures, which provides very high accuracy.

Yes. Machine learning can be disabled in specific configurations, for example to improve performance or meet special compliance requirements.

Yes. Cleardox is built using several open-source technologies as part of its software stack.

Cleardox is hosted exclusively on European servers. There are no third-country data transfers.

All customer data is encrypted both in transit and at rest. Access is strictly controlled using role-based permissions.

Yes. Cleardox supports Single Sign-On via standard technologies such as SAML, OAuth, and LDAP, typically integrated with Active Directory.

Cleardox supports both administrator and user roles. Permissions are typically managed via Active Directory or directly within Cleardox.

By default, administrators cannot access other users’ active cases. This can be configured if required.

As an admin user, you have access to a range of additional features. These primarily include the ability to create, share, and maintain shared categories, exhibit stamps, templates, and header and footer templates.

Multi-factor authentication is typically handled via the customer’s identity provider, such as Microsoft Entra ID, or via Cleardox’s own supported OTP.

Yes – both options are available.
So far, all of our customers use our hosted cloud solution, as we have a secure setup in place and because most customers value access to continuous and fast product updates.
However, we also recognize that some organizations operate in industries with specific security or compliance requirements. Therefore, we also offer the option to install Cleardox on-premise on your own servers.
Please feel free to contact us to learn more.

Cleardox uses runtime threat detection, infrastructure monitoring, and centralized logging to detect and respond to security incidents.

Log files are stored for 1 year and are then automatically deleted.

Cleardox uses an application-managed secret service rather than a hardware HSM or cloud KMS. The KEK is held in an encrypted secrets file (secrets.txt) that is loaded into the secret service at startup. The file is protected by AES-256 encryption with a password-derived key (using PBKDF2), and each entry is HMAC-SHA256 checksummed for integrity. The secret service runs as an isolated container in the cluster with the secrets file mounted as a volume. The password to unlock the file is supplied at service startup via an environment variable and is not persisted to disk.

Access to the secret service is restricted at two layers:
- Network layer: The service is not exposed publicly. It is reachable only from within the internal network, on a private port.
- Transport layer: The service uses mutual TLS (mTLS). Clients must present a valid certificate signed by the internal CA to establish a connection. Only the Cleardox backend application holds such a certificate, it is generated inside the cluster.
Cleardox personnel cannot access the underlying host and the default access privileges do not allow container exec or secrets viewing permissions, meaning that elevated permissions are needed to access the encrypted key and its password.

Yes. At runtime, only the backend application can issue requests to the secret service, enforced by mTLS client certificate validation - the service rejects any connection that does not present a matching certificate. There is no API exposed to end users or external parties.

We don't log every request as the application continously requests decryption so it would be a flood of logs with low signal. We log in aggregate to alert on suspicious behavior like bursty usage or use of a rotated key.

Key rotation is supported by a versioned multi-key scheme. A new key is introduced and it will be used for new incoming projects and documents. While the service continues to decrypt data protected by older keys.
Documents on the platform get deleted once the project is completed or after a timelimit. We monitor a count of how many projects a using the old vs the new key and the old key gets decomissions once it has 0 usage. The system keeps the newest key always. The keys creation timestamp is recorded, and the service logs the age of each loaded key at startup.

Updates are tested in staging environments before being rolled out gradually to production systems.

Cleardox is built for high availability with monitoring and alerting to ensure stable day-to-day operations.

Yes. Cleardox’ security and internal controls are documented in an ISAE 3000 Type II assurance report with a high level of assurance, which is ISO 27001 compliant.
The report describes our security processes, controls, and procedures related to the handling of customer data and the operation of the platform. You can download the report here.

Cleardox integrates with systems such as iManage, SharePoint, Acadre, Workzone, HighQ from Thomson Reuters, and other case and document management systems, either through direct integrations or via the Cleardox API. New integrations are continuously added and can easily be built by customers themselves using our external API.

Integrations use secure authentication flows. Cleardox only accesses documents that users already have permission to view in the source system.

Cleardox fully respects permissions from the source system. Users can only access and process documents they are authorized to view.

At Cleardox, you will be assigned a dedicated customer manager who will help you get started quickly and smoothly.
The onboarding process is tailored to your needs and can include:
- System setup
- Configuration of categories
- Training in how to use the system
Implementation is usually very straightforward. Once users have been introduced to the system, we typically find that they quickly become self-sufficient without the need for additional training.

In a user-based model, you pay per named user with access to the system. Pricing is tied to specific users regardless of usage volume.

In a seat-based model, you purchase a number of active users per month. Anyone in the organization can access the system, but only unique active users in a given month count as a seat.

The seat model is based on active monthly usage and is flexible for fluctuating teams. The user-based model is based on fixed named users and is best for stable user groups.

A seat-based model allows broad access without paying for all potential users. You only pay for those who actively use the system in a given month, making it flexible and cost-efficient.

Pricing is based on the number of unique active users per month, regardless of how many different people use the system over time.

If usage is spread across many potential users, the seat model is usually best. If usage is limited to a defined group of users, the user-based model may be more suitable.

Yes. Cleardox offers fixed-price project engagements for larger, defined tasks such as data subject access requests (DSAR's). Pricing depends on document volume and complexity. Contact us for a tailored assessment.

Yes. You can switch models if your needs change over time.

Yes. The standard trial period is typically two weeks, but extensions can be arranged if more evaluation time is needed.

Cleardox is typically offered either as a user-based model, a seat-based model, or as a project-based engagement.
The smallest setup will usually be either 1 seat, 3 named users, or a limited project based on a specific number of pages.
The exact price depends on your needs, data volume, and use case. Feel free to contact us to receive a tailored quote.


