Skip to content
Snippets Groups Projects
Unverified Commit c3f5a0da authored by Odin Kroeger's avatar Odin Kroeger
Browse files

docs(README): Mentioned what the repo is about

parent b80a38f5
Branches main
No related tags found
No related merge requests found
# Translating the Basisklassifikation
# Basisklassifikation WordPress plugin
This is a WordPress plugin that provides a taxonomy for the so-called
Basisklassifikation and an admin panel to upload JSKOS-files that contain
a current version of that classificatory scheme.
Note that processing a JSKOS file will likely take longer than PHP scripts are
permitted to run on your server. WordPress has not been made to create
thousands of taxonomy terms at once. You will need to adjust this limit before
uploading a JSKOS file.
## Translating the Basisklassifikation
This assumes that you are using Polylang and that the taxonomy has already
been translated.
......@@ -18,7 +30,7 @@ You will need access to a modern-ish Unix-like system with:
* [Pandoc](https://www.pandoc.org/) >= v3.0
## Terminology
### Terminology
Basisklassifikation classes have an organisational number (e.g., "08.00").
This number, in keeping with JSKOS terminology, is hereafter referred to as
......@@ -29,7 +41,7 @@ names, which are referred to as "labels". This plugin ignores alternative labels
every mention of a "label" always refers to "preferred" one.
## Get a JSKOS file
### Get a JSKOS file
The JSKOS format defines a JSON format (hence "JS") for knowledge
organisation systems (hence "KOS").
......@@ -49,10 +61,10 @@ that contains the German labels of the Basisklassifikation is named
"01-bk-de.jskos"
```sh
jskos_de=01-bk-de.jskos
jskos_de=01-bk-de.jskos
```
## Extract German labels
### Extract German labels
DeepL only accepts word processing documents as input, so we will
perform some conversions. The route we will take is:
......@@ -68,18 +80,18 @@ You should find it in the "scripts" directory.
Run:
```sh
csv_de=02-bk-de.csv
scripts/jskos2csv de "$jskos_de" | sort -u >"$csv_de"
csv_de=02-bk-de.csv
scripts/jskos2csv de "$jskos_de" | sort -u >"$csv_de"
```
"jskos2csv" should work without issue, but the earlier we catch errors,
the better. So we will check whether classes are missing:
```sh
jskos_de_sn=jskos-de-sub-notations csv_de_sn=csv-de-sub-notations
grep -oE '[0-9]+\.[0-9]+' "$jskos_de" | sort -u >"$jskos_de_sn"
grep -oE '[0-9]+\.[0-9]+' "$csv_de" | sort -u >"$csv_de_sn"
diff "$jskos_de_sn" "$csv_de_sn"
jskos_de_sn=jskos-de-sub-notations csv_de_sn=csv-de-sub-notations
grep -oE '[0-9]+\.[0-9]+' "$jskos_de" | sort -u >"$jskos_de_sn"
grep -oE '[0-9]+\.[0-9]+' "$csv_de" | sort -u >"$csv_de_sn"
diff "$jskos_de_sn" "$csv_de_sn"
```
`diff "$jskos_de_sn" "$csv_de_sn"` prints the notation of each missing class,
......@@ -88,25 +100,25 @@ does *not* print anything, then no non-top-level class is missing;
otherwise, you have to add the missing classes manually.
## Convert to Office Open XML (aka ".docx")
### Convert to Office Open XML (aka ".docx")
The CSV file can be converted to an Office Open XML file using Pandoc
and a custom CSV reader "csv.lua", which ships with this plugin and is
also located in the scripts folder:
```sh
docx_de=03-bk-de.docx
pandoc -fscripts/csv.lua -o"$docx_de" "$csv_de"
docx_de=03-bk-de.docx
pandoc -fscripts/csv.lua -o"$docx_de" "$csv_de"
```
Again, we check whether any classes are missing in the created file:
```sh
csv_de_n=csv-de-notations docx_de_n=docx-de-notations
sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' "$csv_de" >"$csv_de_n"
pandoc -tscripts/csv.lua "$docx_de" |
sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' >"$docx_de_n"
diff "$csv_de_n" "$docx_de_n"
csv_de_n=csv-de-notations docx_de_n=docx-de-notations
sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' "$csv_de" >"$csv_de_n"
pandoc -tscripts/csv.lua "$docx_de" |
sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' >"$docx_de_n"
diff "$csv_de_n" "$docx_de_n"
```
`diff "$csv_de_n" "$docx_de_n"` prints the notation of each missing class,
......@@ -114,7 +126,7 @@ including top-level classes, provided that they are not missing from the CSV
file. Again, missing classes need to be added manually.
## Translate using DeepL
### Translate using DeepL
Upload "03-bk-de.docx" to <https://www.deepl.com/translator/files>,
make sure that the document's language is set to German, and let DeepL
......@@ -127,42 +139,42 @@ the Basisklassifikation to (e.g., "en").
Yet again, we check whether any classes are missing from the translation:
```sh
lang=XX
docx_trans="04-bk-$lang.docx" docx_trans_n="csv-$lang-notations"
pandoc -tscripts/csv.lua "$docx_trans" |
sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' >"$docx_trans_n"
diff "$csv_de_n" "$docx_trans_n"
lang=XX
docx_trans="04-bk-$lang.docx" docx_trans_n="csv-$lang-notations"
pandoc -tscripts/csv.lua "$docx_trans" |
sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' >"$docx_trans_n"
diff "$csv_de_n" "$docx_trans_n"
```
Again, missing classes need to be added manually.
## Convert back to CSV
### Convert back to CSV
The Office Open XML file can be converted back to a CSV file using Pandoc
and a custom CSV writer "csv.lua" (the same script as before), which ships
with this plugin and is also located in the scripts folder:
```sh
csv_trans="05-bk-$lang.csv"
pandoc -tscripts/csv.lua -o"$csv_trans" "$docx_trans"
csv_trans="05-bk-$lang.csv"
pandoc -tscripts/csv.lua -o"$csv_trans" "$docx_trans"
```
Check whether classes are missing:
```sh
csv_trans_n="csv-$lang-notations"
sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' "$csv_trans" >"$csv_trans_n"
diff "$csv_de_n" "$csv_trans_n"
csv_trans_n="csv-$lang-notations"
sed -n 's/^"*\([0-9]*\.[0-9]*\).*/\1/p' "$csv_trans" >"$csv_trans_n"
diff "$csv_de_n" "$csv_trans_n"
```
## Correct translations errors
### Correct translations errors
This is the best time to fix the errors that DeepL made.
```
csv_trans_corrected="06-bk-$lang-corrected.csv"
cp "$csv_trans" "$csv_trans_corrected"
csv_trans_corrected="06-bk-$lang-corrected.csv"
cp "$csv_trans" "$csv_trans_corrected"
```
Open 06-bk-XX-corrected.csv with your favourite text editor and edit away.
......@@ -180,7 +192,7 @@ You will want to look out for:
* Inconsistently translated abbreviations.
## Create a new ndJSON JSKOS file
### Create a new ndJSON JSKOS file
Create a new JSKOS file that contains the German as well as your
translated labels:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment