Technical documentation - Import documents
Documents import configurations can be one or many in the DocumentConfigs section. The attribute “config-name” is the name of configuration and the attribute “files-extensions” defines which type of files that are allowed to be imported.
For example, if we only want to import pdf files, then we specify files-extensions="*.pdf". If we want to import more than one type of file, we can separate the different extensions by using double pipe or vertical bar, for example "*.pdf||*.doc".
Documents import config contains two parts; configuration for a single directory and configuration for importing documents from sub-directories.
<SingleDirectory type="Single">
<SubDirectory type="SubDirectories">
Document identity, language and transformation
Both single directory configuration and sub directories configuration consist of 3 parts:
1. Document Identification
2. Language Identification
3. Document transformation
Documents and document languages are identified by using the document filename or the directory name. For both document identification and language identification, we can apply regular expression to fetch required string values from the filename or directory name, which can be up to 255 characters long.
In the name-regex example below, the filenames are divided in three segments of information such as identity, name and language separated with underscore(_) e.g: identity_name_language. If a filename is "1234_myDoc_sv", then "1234" will be interpreted as persistent identity, "myDoc" is the name of document and "sv" is language which is being mapped to sv_Se. Language mapping is an optional option. In document import configuration example identity and persistent identity are using the same segment of string. Persistent identity ensures that not duplicate document created and if document exists already then update or replace the same document. If document file has no language defined e.g :"identity_name_" then language will be imported as invariant language.
Document transformations are linked to the transformation configs section of import.config.
The transformation config below tells the import to apply a transformation style. The transformation style will transform the document file from its original format to the specified format and then file will be imported.
The source document file will not be transformed, it will still be available at the same location as before.
The following configuration tells the import that language should be identified from the directory name, a good example for this can be if we are planing to import the structure in the image below.
The structure above is an example of a documents import of type "sub-directories". In this case, each sub-directory represent a language and contains document files. Regular expressions and language mappings can also be used, in the same way as when using import type "single directory".
Documents can be imported by right clicking on documents repository.
Related articles