Basisklassifikation WordPress plugin
This is a WordPress plugin that provides a taxonomy for the so-called Basisklassifikation and an admin panel to upload JSKOS-files that contain a current version of that classificatory scheme.
Note that processing a JSKOS file will likely take longer than PHP scripts are permitted to run on your server. WordPress has not been made to create thousands of taxonomy terms at once. You will need to adjust this limit before uploading a JSKOS file.
Translating the Basisklassifikation
This assumes that you are using Polylang and that the taxonomy has already been translated.
To translate the Basisklassifikation, download a copy pf the whole Basisklassifkation in the JSKOS, feed that copy into DeepL, and merge the translation back into the JSKOS file before uploading it in the Basisklassifikation plugin's settings screen.
This procedure is involved and error-prone. Please read this guide in full before starting.
You will need access to a modern-ish Unix-like system with:
Terminology
Basisklassifikation classes have an organisational number (e.g., "08.00"). This number, in keeping with JSKOS terminology, is hereafter referred to as "notation".
Classes also have names. JSKOS distinguishes "preferred" and "alternative" names, which are referred to as "labels". This plugin ignores alternative labels, every mention of a "label" always refers to "preferred" one.
Get a JSKOS file
The JSKOS format defines a JSON format (hence "JS") for knowledge organisation systems (hence "KOS").
JSKOS has various sub-formats (ndJSON, RDF/XML, MARC/XML, JSON-LD); of these, only newline-delimted JSON ("ndJSON") is supported; other sub-formats will cause Syntax errors.
This plugin ships with a ndJSON JSKOS file named "01-bk-de.jskos". You can find it in the "data" directory.
If you need a newer version of the file, you can download one from: https://api.dante.gbv.de/export/download/bk/default
For the remainder of this guide, it is assumed that the JSKOS file that contains the German labels of the Basisklassifikation is named "01-bk-de.jskos"
jskos_de=01-bk-de.jskos
Extract German labels
DeepL only accepts word processing documents as input, so we will perform some conversions. The route we will take is:
JSKOS -> CSV -> DOCX -> DOCX -> CSV -> JSKOS
So, the first step is to extract the notations and the German labels and store them as comma-separated values (CSV).
The plugin ships with a PHP script, "jskos2csv", which can perform this task. You should find it in the "scripts" directory.
Run:
csv_de=02-bk-de.csv
scripts/jskos2csv de "$jskos_de" | sort -u >"$csv_de"
"jskos2csv" should work without issue, but the earlier we catch errors, the better. So we will check whether classes are missing:
jskos_de_sn=jskos-de-sub-notations csv_de_sn=csv-de-sub-notations
grep -oE '[0-9]+\.[0-9]+' "$jskos_de" | sort -u >"$jskos_de_sn"
grep -oE '[0-9]+\.[0-9]+' "$csv_de" | sort -u >"$csv_de_sn"
diff "$jskos_de_sn" "$csv_de_sn"
diff "$jskos_de_sn" "$csv_de_sn"
prints the notation of each missing class,
save for top-level classes, which use a different syntax; if this command
does not print anything, then no non-top-level class is missing;
otherwise, you have to add the missing classes manually.
Convert to Office Open XML (aka ".docx")
The CSV file can be converted to an Office Open XML file using Pandoc and a custom CSV reader "csv.lua", which ships with this plugin and is also located in the scripts folder:
docx_de=03-bk-de.docx
pandoc -fscripts/csv.lua -o"$docx_de" "$csv_de"
Again, we check whether any classes are missing in the created file:
csv_de_n=csv-de-notations docx_de_n=docx-de-notations
sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' "$csv_de" >"$csv_de_n"
pandoc -tscripts/csv.lua "$docx_de" |
sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' >"$docx_de_n"
diff "$csv_de_n" "$docx_de_n"
diff "$csv_de_n" "$docx_de_n"
prints the notation of each missing class,
including top-level classes, provided that they are not missing from the CSV
file. Again, missing classes need to be added manually.
Translate using DeepL
Upload "03-bk-de.docx" to https://www.deepl.com/translator/files, make sure that the document's language is set to German, and let DeepL translate it to a language of your choice.
Store the resulting file as "04-bk-XX.docx", but replace "XX" with the ISO 631 code of the language you have asked DeepL to translate the Basisklassifikation to (e.g., "en").
Yet again, we check whether any classes are missing from the translation:
lang=XX
docx_trans="04-bk-$lang.docx" docx_trans_n="csv-$lang-notations"
pandoc -tscripts/csv.lua "$docx_trans" |
sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' >"$docx_trans_n"
diff "$csv_de_n" "$docx_trans_n"
Again, missing classes need to be added manually.
Convert back to CSV
The Office Open XML file can be converted back to a CSV file using Pandoc and a custom CSV writer "csv.lua" (the same script as before), which ships with this plugin and is also located in the scripts folder:
csv_trans="05-bk-$lang.csv"
pandoc -tscripts/csv.lua -o"$csv_trans" "$docx_trans"
Check whether classes are missing:
csv_trans_n="csv-$lang-notations"
sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' "$csv_trans" >"$csv_trans_n"
diff "$csv_de_n" "$csv_trans_n"
Correct translations errors
This is the best time to fix the errors that DeepL made.
csv_trans_corrected="06-bk-$lang-corrected.csv"
cp "$csv_trans" "$csv_trans_corrected"
Open 06-bk-XX-corrected.csv with your favourite text editor and edit away.
You will want to look out for:
- Untranslated words (e.g., DeepL translates "Staatslehre" into English as "Staatslehre").
- Translations of terms that may appear to be English in the original (e.g., the german "Moore"/"marshes" is translated into English as "Moore")
- Short class labels that do not provide enough context for DeepL's model (e.g., the class "Frau"/"woman" is translated into English as "Ms").
- Repeated words and phrases.
- Inconsistent capitalisation.
- Inconsistently translated abbreviations.
Create a new ndJSON JSKOS file
Create a new JSKOS file that contains the German as well as your translated labels:
corrected_csv="11-bk-$lang-corrected.csv"
../scripts/csv2jskos "$lang" "$corrected_csv" "$jskos_de"