Technical documentation - Removing duplicate texts

This article describes the function to remove duplicate texts from the text storage and to clean hidden characters from texts.

Version 4.4

This function is available in version 4.4 and later.

Overview

The remove duplicate text function will operate on all texts in the text storage and find the texts which are duplicates. Each duplicate will be replaced and then removed.

Two alternatives exist

  1. Remove and merge empty Pid. This option will find all texts which are identical. If identical texts exist where one of the texts has a persistent identity, the texts with empty persistent identity will be merged into the text with a persistent identity.

  2. Remove and leave empty Pid. This option will find all texts which are identical. If identical texts exist where one or more of the texts has a persistent identity, each text with unique persistent identity will be left and the texts with empty persistent identity will be merged leaving the persistent identity empty.

For detailed removal logic see below.

Suggested strategy and usage

The suggested strategy to remove duplicate texts is to first remove unused texts not needed. Follow the below steps.

  1. Make a backup of the database

  2. Remove texts which are not in use and that you do not forsee as a future standard text

  3. Use the remove function to remove duplicates



Make backup!

Ensure to make a backup of the database before cleaning texts. The removal of unused texts and duplicates cannot be reverted and any mistake made will not be recoverable from within the application.

Instructions

Follow these steps to remove duplicates

  1. Ensure a backup of the database exist

  2. Filter the text repository to get a the subset you wish to apply the remove duplicates on. The remove duplicate function only operates on the subset which is filtered in the storage.

  3. Select removal option

  4. Now a window with all the found duplicates will be presented. Each duplicate will exist once. Select the duplicates that are to be replaced and deleted.

  5. Click Delete. Only the texts selected will be replaced and removed.

  6. A warning will appear. Click Yes to proceed and delete.

  7. A confirmation will appear showing how many texts that have been removed and replaced.

Remove and merge empty Pid - example of removal logic

Example 1

Before removal operation. Only one of the duplicates has a persistent identity. Texts with empty persistent identity will be merged into the text with a persistent identity.

Persistent id

en

de

Persistent id

en

de

test

test

test



test

test



test

test

After removal operation. Only one text will remain.

Persistent id

en

de

Persistent id

en

de

test

test

test



Example 2

Before removal operation. Two or more of the duplicates has a persistent identity. Texts with empty persistent identity will be merged, but not into a text with pesistent identity since the application does not know which one to choose.

Persistent id

en

de

Persistent id

en

de

test

test

test



test

test



test

test

abc

test

test

After removal operation. Each text with separate persistent identity will remain and only one text with empty persistent identity.

Persistent id

en

de

Persistent id

en

de

test

test

test



test

test

abc

test

test



Remove and leave empty Pid - example of removal logic

Before removal operation. Only one of the duplicates has a persistent identity. Texts with empty persistent identity will be merged with other texts with empty persistent identity but not with the texts that has persistent idenitity.

Persistent id

en

de

Persistent id

en

de

test

test

test



test

test



test

test

After removal operation. Each text with separate persistent identity will remain and only one text with empty persistent identity.

Persistent id

en

de

Persistent id

en

de

test

test

test



test

test