Document Deduplication Module

Parashift's Document Deduplication module provides users the ability to find duplicate documents within Alfresco. This module uses a search index to find documents that are duplicates of eachother, and displays the results within Alfresco Share:



  • Ability to index content in an external solr instance
  • Ability to find similar documents based upon the contents of the document
  • Ability to find similar images based upon their image signature


The change log for Document Deduplication can be found here



Document Deduplication comes with both a Share and Repo amp, and a Solr core configuration. Please follow our Installation guide on how to install this module into Alfresco, and follow the instructions below on how to install and configure Solr.


Deduplication relies on a seperate Solr index to provide deduplication detection:


Add the following configurations to


Initial Data Import

If you are installing to an existing environment, you can have solr reprocess existing files by running the Data Import Handler from the Solr Administration panel.

Before doing so, change data-config.xml in the deduplicate directory to add the username/password of your administrator account.


To find a duplicate document:

  • Navigate to the document details for a document
  • Scroll down the bottom of the page.
  • Similar and duplicate documents will be listed underneath the Version History