Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
B
Basisklassifikation
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Package registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Department of Philosophy
Audiothek
Basisklassifikation
Commits
c3f5a0da
Unverified
Commit
c3f5a0da
authored
1 month ago
by
Odin Kroeger
Browse files
Options
Downloads
Patches
Plain Diff
docs(README): Mentioned what the repo is about
parent
b80a38f5
Branches
main
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
README.md
+47
-35
47 additions, 35 deletions
README.md
with
47 additions
and
35 deletions
README.md
+
47
−
35
View file @
c3f5a0da
# Translating the Basisklassifikation
# Basisklassifikation WordPress plugin
This is a WordPress plugin that provides a taxonomy for the so-called
Basisklassifikation and an admin panel to upload JSKOS-files that contain
a current version of that classificatory scheme.
Note that processing a JSKOS file will likely take longer than PHP scripts are
permitted to run on your server. WordPress has not been made to create
thousands of taxonomy terms at once. You will need to adjust this limit before
uploading a JSKOS file.
## Translating the Basisklassifikation
This assumes that you are using Polylang and that the taxonomy has already
been translated.
...
...
@@ -18,7 +30,7 @@ You will need access to a modern-ish Unix-like system with:
*
[
Pandoc
](
https://www.pandoc.org/
)
>= v3.0
## Terminology
##
#
Terminology
Basisklassifikation classes have an organisational number (e.g., "08.00").
This number, in keeping with JSKOS terminology, is hereafter referred to as
...
...
@@ -29,7 +41,7 @@ names, which are referred to as "labels". This plugin ignores alternative labels
every mention of a "label" always refers to "preferred" one.
## Get a JSKOS file
##
#
Get a JSKOS file
The JSKOS format defines a JSON format (hence "JS") for knowledge
organisation systems (hence "KOS").
...
...
@@ -49,10 +61,10 @@ that contains the German labels of the Basisklassifikation is named
"01-bk-de.jskos"
```
sh
jskos_de
=
01-bk-de.jskos
jskos_de
=
01-bk-de.jskos
```
## Extract German labels
##
#
Extract German labels
DeepL only accepts word processing documents as input, so we will
perform some conversions. The route we will take is:
...
...
@@ -68,18 +80,18 @@ You should find it in the "scripts" directory.
Run:
```
sh
csv_de
=
02-bk-de.csv
scripts/jskos2csv de
"
$jskos_de
"
|
sort
-u
>
"
$csv_de
"
csv_de
=
02-bk-de.csv
scripts/jskos2csv de
"
$jskos_de
"
|
sort
-u
>
"
$csv_de
"
```
"jskos2csv" should work without issue, but the earlier we catch errors,
the better. So we will check whether classes are missing:
```
sh
jskos_de_sn
=
jskos-de-sub-notations
csv_de_sn
=
csv-de-sub-notations
grep
-oE
'[0-9]+\.[0-9]+'
"
$jskos_de
"
|
sort
-u
>
"
$jskos_de_sn
"
grep
-oE
'[0-9]+\.[0-9]+'
"
$csv_de
"
|
sort
-u
>
"
$csv_de_sn
"
diff
"
$jskos_de_sn
"
"
$csv_de_sn
"
jskos_de_sn
=
jskos-de-sub-notations
csv_de_sn
=
csv-de-sub-notations
grep
-oE
'[0-9]+\.[0-9]+'
"
$jskos_de
"
|
sort
-u
>
"
$jskos_de_sn
"
grep
-oE
'[0-9]+\.[0-9]+'
"
$csv_de
"
|
sort
-u
>
"
$csv_de_sn
"
diff
"
$jskos_de_sn
"
"
$csv_de_sn
"
```
`diff "$jskos_de_sn" "$csv_de_sn"`
prints the notation of each missing class,
...
...
@@ -88,25 +100,25 @@ does *not* print anything, then no non-top-level class is missing;
otherwise, you have to add the missing classes manually.
## Convert to Office Open XML (aka ".docx")
##
#
Convert to Office Open XML (aka ".docx")
The CSV file can be converted to an Office Open XML file using Pandoc
and a custom CSV reader "csv.lua", which ships with this plugin and is
also located in the scripts folder:
```
sh
docx_de
=
03-bk-de.docx
pandoc
-fscripts
/csv.lua
-o
"
$docx_de
"
"
$csv_de
"
docx_de
=
03-bk-de.docx
pandoc
-fscripts
/csv.lua
-o
"
$docx_de
"
"
$csv_de
"
```
Again, we check whether any classes are missing in the created file:
```
sh
csv_de_n
=
csv-de-notations
docx_de_n
=
docx-de-notations
sed
-n
's/^"*\([0-9]*\.[0-9]*\).*/\1/p'
"
$csv_de
"
>
"
$csv_de_n
"
pandoc
-tscripts
/csv.lua
"
$docx_de
"
|
sed
-n
's/^"*\([0-9]*\.[0-9]*\).*/\1/p'
>
"
$docx_de_n
"
diff
"
$csv_de_n
"
"
$docx_de_n
"
csv_de_n
=
csv-de-notations
docx_de_n
=
docx-de-notations
sed
-n
's/^"*\([0-9]*\.[0-9]*\).*/\1/p'
"
$csv_de
"
>
"
$csv_de_n
"
pandoc
-tscripts
/csv.lua
"
$docx_de
"
|
sed
-n
's/^"*\([0-9]*\.[0-9]*\).*/\1/p'
>
"
$docx_de_n
"
diff
"
$csv_de_n
"
"
$docx_de_n
"
```
`diff "$csv_de_n" "$docx_de_n"`
prints the notation of each missing class,
...
...
@@ -114,7 +126,7 @@ including top-level classes, provided that they are not missing from the CSV
file. Again, missing classes need to be added manually.
## Translate using DeepL
##
#
Translate using DeepL
Upload "03-bk-de.docx" to
<https://www.deepl.com/translator/files>
,
make sure that the document's language is set to German, and let DeepL
...
...
@@ -127,42 +139,42 @@ the Basisklassifikation to (e.g., "en").
Yet again, we check whether any classes are missing from the translation:
```
sh
lang
=
XX
docx_trans
=
"04-bk-
$lang
.docx"
docx_trans_n
=
"csv-
$lang
-notations"
pandoc
-tscripts
/csv.lua
"
$docx_trans
"
|
sed
-n
's/^"*\([0-9]*\.[0-9]*\).*/\1/p'
>
"
$docx_trans_n
"
diff
"
$csv_de_n
"
"
$docx_trans_n
"
lang
=
XX
docx_trans
=
"04-bk-
$lang
.docx"
docx_trans_n
=
"csv-
$lang
-notations"
pandoc
-tscripts
/csv.lua
"
$docx_trans
"
|
sed
-n
's/^"*\([0-9]*\.[0-9]*\).*/\1/p'
>
"
$docx_trans_n
"
diff
"
$csv_de_n
"
"
$docx_trans_n
"
```
Again, missing classes need to be added manually.
## Convert back to CSV
##
#
Convert back to CSV
The Office Open XML file can be converted back to a CSV file using Pandoc
and a custom CSV writer "csv.lua" (the same script as before), which ships
with this plugin and is also located in the scripts folder:
```
sh
csv_trans
=
"05-bk-
$lang
.csv"
pandoc
-tscripts
/csv.lua
-o
"
$csv_trans
"
"
$docx_trans
"
csv_trans
=
"05-bk-
$lang
.csv"
pandoc
-tscripts
/csv.lua
-o
"
$csv_trans
"
"
$docx_trans
"
```
Check whether classes are missing:
```
sh
csv_trans_n
=
"csv-
$lang
-notations"
sed
-n
's/^"*\([0-9]*\.[0-9]*\).*/\1/p'
"
$csv_trans
"
>
"
$csv_trans_n
"
diff
"
$csv_de_n
"
"
$csv_trans_n
"
csv_trans_n
=
"csv-
$lang
-notations"
sed
-n
's/^"*\([0-9]*\.[0-9]*\).*/\1/p'
"
$csv_trans
"
>
"
$csv_trans_n
"
diff
"
$csv_de_n
"
"
$csv_trans_n
"
```
## Correct translations errors
##
#
Correct translations errors
This is the best time to fix the errors that DeepL made.
```
csv_trans_corrected="06-bk-$lang-corrected.csv"
cp "$csv_trans" "$csv_trans_corrected"
csv_trans_corrected="06-bk-$lang-corrected.csv"
cp "$csv_trans" "$csv_trans_corrected"
```
Open 06-bk-XX-corrected.csv with your favourite text editor and edit away.
...
...
@@ -180,7 +192,7 @@ You will want to look out for:
*
Inconsistently translated abbreviations.
## Create a new ndJSON JSKOS file
##
#
Create a new ndJSON JSKOS file
Create a new JSKOS file that contains the German as well as your
translated labels:
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment