Technical documentation - Removing duplicate texts
This article describes the function to remove duplicate texts from the text storage and to clean hidden characters from texts.
Version 4.4
This function is available in version 4.4 and later.
Overview
The remove duplicate text function will operate on all texts in the text storage and find the texts which are duplicates. Each duplicate will be replaced and then removed.
Two alternatives exist
Remove and merge empty Pid. This option will find all texts which are identical. If identical texts exist where one of the texts has a persistent identity, the texts with empty persistent identity will be merged into the text with a persistent identity.
Remove and leave empty Pid. This option will find all texts which are identical. If identical texts exist where one or more of the texts has a persistent identity, each text with unique persistent identity will be left and the texts with empty persistent identity will be merged leaving the persistent identity empty.
For detailed removal logic see below.
Suggested strategy and usage
The suggested strategy to remove duplicate texts is to first remove unused texts not needed. Follow the below steps.
Make a backup of the database
Remove texts which are not in use and that you do not forsee as a future standard text
Use the remove function to remove duplicates
Make backup!
Ensure to make a backup of the database before cleaning texts. The removal of unused texts and duplicates cannot be reverted and any mistake made will not be recoverable from within the application.
Instructions
Follow these steps to remove duplicates
Ensure a backup of the database exist
Filter the text repository to get a the subset you wish to apply the remove duplicates on. The remove duplicate function only operates on the subset which is filtered in the storage.
Select removal option
Now a window with all the found duplicates will be presented. Each duplicate will exist once. Select the duplicates that are to be replaced and deleted.
Click Delete. Only the texts selected will be replaced and removed.
A warning will appear. Click Yes to proceed and delete.
A confirmation will appear showing how many texts that have been removed and replaced.
Remove and merge empty Pid - example of removal logic
Example 1
Before removal operation. Only one of the duplicates has a persistent identity. Texts with empty persistent identity will be merged into the text with a persistent identity.
Persistent id | en | de |
---|---|---|
test | test | test |
test | test | |
test | test |
After removal operation. Only one text will remain.
Persistent id | en | de |
---|---|---|
test | test | test |
Example 2
Before removal operation. Two or more of the duplicates has a persistent identity. Texts with empty persistent identity will be merged, but not into a text with pesistent identity since the application does not know which one to choose.
Persistent id | en | de |
---|---|---|
test | test | test |
test | test | |
test | test | |
abc | test | test |
After removal operation. Each text with separate persistent identity will remain and only one text with empty persistent identity.
Persistent id | en | de |
---|---|---|
test | test | test |
test | test | |
abc | test | test |
Remove and leave empty Pid - example of removal logic
Before removal operation. Only one of the duplicates has a persistent identity. Texts with empty persistent identity will be merged with other texts with empty persistent identity but not with the texts that has persistent idenitity.
Persistent id | en | de |
---|---|---|
test | test | test |
test | test | |
test | test |
After removal operation. Each text with separate persistent identity will remain and only one text with empty persistent identity.
Persistent id | en | de |
---|---|---|
test | test | test |
test | test |