diff --git a/README.md b/README.md index bdcee0f83ad00febcd48876c6f1129ac335369b1..539887370dc63bacaff000e660441c8062f8901a 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,16 @@ -# Translating the Basisklassifikation +# Basisklassifikation WordPress plugin + +This is a WordPress plugin that provides a taxonomy for the so-called +Basisklassifikation and an admin panel to upload JSKOS-files that contain +a current version of that classificatory scheme. + +Note that processing a JSKOS file will likely take longer than PHP scripts are +permitted to run on your server. WordPress has not been made to create +thousands of taxonomy terms at once. You will need to adjust this limit before +uploading a JSKOS file. + + +## Translating the Basisklassifikation This assumes that you are using Polylang and that the taxonomy has already been translated. @@ -18,7 +30,7 @@ You will need access to a modern-ish Unix-like system with: * [Pandoc](https://www.pandoc.org/) >= v3.0 -## Terminology +### Terminology Basisklassifikation classes have an organisational number (e.g., "08.00"). This number, in keeping with JSKOS terminology, is hereafter referred to as @@ -29,7 +41,7 @@ names, which are referred to as "labels". This plugin ignores alternative labels every mention of a "label" always refers to "preferred" one. -## Get a JSKOS file +### Get a JSKOS file The JSKOS format defines a JSON format (hence "JS") for knowledge organisation systems (hence "KOS"). @@ -49,10 +61,10 @@ that contains the German labels of the Basisklassifikation is named "01-bk-de.jskos" ```sh - jskos_de=01-bk-de.jskos +jskos_de=01-bk-de.jskos ``` -## Extract German labels +### Extract German labels DeepL only accepts word processing documents as input, so we will perform some conversions. The route we will take is: @@ -68,18 +80,18 @@ You should find it in the "scripts" directory. Run: ```sh - csv_de=02-bk-de.csv - scripts/jskos2csv de "$jskos_de" | sort -u >"$csv_de" +csv_de=02-bk-de.csv +scripts/jskos2csv de "$jskos_de" | sort -u >"$csv_de" ``` "jskos2csv" should work without issue, but the earlier we catch errors, the better. So we will check whether classes are missing: ```sh - jskos_de_sn=jskos-de-sub-notations csv_de_sn=csv-de-sub-notations - grep -oE '[0-9]+\.[0-9]+' "$jskos_de" | sort -u >"$jskos_de_sn" - grep -oE '[0-9]+\.[0-9]+' "$csv_de" | sort -u >"$csv_de_sn" - diff "$jskos_de_sn" "$csv_de_sn" +jskos_de_sn=jskos-de-sub-notations csv_de_sn=csv-de-sub-notations +grep -oE '[0-9]+\.[0-9]+' "$jskos_de" | sort -u >"$jskos_de_sn" +grep -oE '[0-9]+\.[0-9]+' "$csv_de" | sort -u >"$csv_de_sn" +diff "$jskos_de_sn" "$csv_de_sn" ``` `diff "$jskos_de_sn" "$csv_de_sn"` prints the notation of each missing class, @@ -88,25 +100,25 @@ does *not* print anything, then no non-top-level class is missing; otherwise, you have to add the missing classes manually. -## Convert to Office Open XML (aka ".docx") +### Convert to Office Open XML (aka ".docx") The CSV file can be converted to an Office Open XML file using Pandoc and a custom CSV reader "csv.lua", which ships with this plugin and is also located in the scripts folder: ```sh - docx_de=03-bk-de.docx - pandoc -fscripts/csv.lua -o"$docx_de" "$csv_de" +docx_de=03-bk-de.docx +pandoc -fscripts/csv.lua -o"$docx_de" "$csv_de" ``` Again, we check whether any classes are missing in the created file: ```sh - csv_de_n=csv-de-notations docx_de_n=docx-de-notations - sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' "$csv_de" >"$csv_de_n" - pandoc -tscripts/csv.lua "$docx_de" | - sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' >"$docx_de_n" - diff "$csv_de_n" "$docx_de_n" +csv_de_n=csv-de-notations docx_de_n=docx-de-notations +sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' "$csv_de" >"$csv_de_n" +pandoc -tscripts/csv.lua "$docx_de" | + sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' >"$docx_de_n" +diff "$csv_de_n" "$docx_de_n" ``` `diff "$csv_de_n" "$docx_de_n"` prints the notation of each missing class, @@ -114,7 +126,7 @@ including top-level classes, provided that they are not missing from the CSV file. Again, missing classes need to be added manually. -## Translate using DeepL +### Translate using DeepL Upload "03-bk-de.docx" to <https://www.deepl.com/translator/files>, make sure that the document's language is set to German, and let DeepL @@ -127,42 +139,42 @@ the Basisklassifikation to (e.g., "en"). Yet again, we check whether any classes are missing from the translation: ```sh - lang=XX - docx_trans="04-bk-$lang.docx" docx_trans_n="csv-$lang-notations" - pandoc -tscripts/csv.lua "$docx_trans" | - sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' >"$docx_trans_n" - diff "$csv_de_n" "$docx_trans_n" +lang=XX +docx_trans="04-bk-$lang.docx" docx_trans_n="csv-$lang-notations" +pandoc -tscripts/csv.lua "$docx_trans" | + sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' >"$docx_trans_n" +diff "$csv_de_n" "$docx_trans_n" ``` Again, missing classes need to be added manually. -## Convert back to CSV +### Convert back to CSV The Office Open XML file can be converted back to a CSV file using Pandoc and a custom CSV writer "csv.lua" (the same script as before), which ships with this plugin and is also located in the scripts folder: ```sh - csv_trans="05-bk-$lang.csv" - pandoc -tscripts/csv.lua -o"$csv_trans" "$docx_trans" +csv_trans="05-bk-$lang.csv" +pandoc -tscripts/csv.lua -o"$csv_trans" "$docx_trans" ``` Check whether classes are missing: ```sh - csv_trans_n="csv-$lang-notations" - sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' "$csv_trans" >"$csv_trans_n" - diff "$csv_de_n" "$csv_trans_n" +csv_trans_n="csv-$lang-notations" +sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' "$csv_trans" >"$csv_trans_n" +diff "$csv_de_n" "$csv_trans_n" ``` -## Correct translations errors +### Correct translations errors This is the best time to fix the errors that DeepL made. ``` - csv_trans_corrected="06-bk-$lang-corrected.csv" - cp "$csv_trans" "$csv_trans_corrected" +csv_trans_corrected="06-bk-$lang-corrected.csv" +cp "$csv_trans" "$csv_trans_corrected" ``` Open 06-bk-XX-corrected.csv with your favourite text editor and edit away. @@ -180,7 +192,7 @@ You will want to look out for: * Inconsistently translated abbreviations. -## Create a new ndJSON JSKOS file +### Create a new ndJSON JSKOS file Create a new JSKOS file that contains the German as well as your translated labels: