What is OAI-PMH and where to find information?
The Open Archives Initiative - Prototcol for Metadata Harvesting provides an application-independent interoperability framework. An Interface must be able to return metadata manifestation as Dublin Core format. Optionally, a repository may also disseminate other formats of metadata. In our oerhub.at LOM format is used!
The details can be found: http://www.openarchives.org
Online Validation
Useful online tool for validation: https://validator.oaipmh.com/
Hint: The Standard says "The Content-Type returned for all OAI-PMH requests must be text/xml, but the validator expect application/xml!
LOM
The Learning Object Metadata is a data model, usually encoded in XML, used to describe a learning object and similar digital resources used to support learning.
The specification by UIBK.
Example Requests
-
ListRecords - To get the List of Resourses.
https://oer-repo.uibk.ac.at/edu-sharing/eduservlet/oai/provider?verb=ListRecords&metadataPrefix=lom
.
If a resumptionToken is present you have to iterate as long as no resunptenToken is shown
https://www.zoerr.de/edu-sharing/eduservlet/oai/provider?verb=ListRecords&metadataPrefix=hs_oer_lom
,
An example for the iteration over the list
https://www.zoerr.de/edu-sharing/eduservlet/oai/provider?verb=ListRecords&resumptionToken=MTozMDB8Mjp8Mzp8NDp8NTpoc19vZXJfbG9t
-
GetRecord - To get the Details of a specific Resourse:
https://oer-repo.uibk.ac.at/edu-sharing/eduservlet/oai/provider?verb=GetRecord&metadataPrefix=lom&identifier=oai:oer-repo.uibk.ac.at:e0d6bbfa-3945-437d-ba16-3c6fab848f37
and a second Example
https://repository.tugraz.at/oai2d?verb=GetRecord&metadataPrefix=lom&identifier=oai:repository.tugraz.at:vqyg5-xb977
.
As mentioned above, some implementaion are returning content-type as text/xml like invenio, unfortunately others like edu-sharing returning application/xml. So the client shall be able to handle both!
oerhub.at harvests the data with a perl-based daemon. The implementation is based on OAI-Harvester 1.20. The Perl modul expects text/xml and supports the standard DC. Our module accepts in addition the application/xml and has the specific LOM-handler implemented. The tests are rewritten against maintend repos.
Required Elements (oerhub 1.2)
- Title - lom.general.title
- Author - lom.lifecycle.contribute.role.value (=Author) and the Name in lom.lifecycle.contribute.centity.vcard or lom.lifecycle.contribue.entity.langstring
- Mediatyp - lom.technical.format (mime/types)
- License - lom.rights.description (not only CC-BY, new: spdx.org/licenses)
- Classification - lom.classification.taxonpath (Öfos)
- Educational - lom.educational.learningresourcetype
- last changed Date/Upload - lom.lifecycle.version.datetime (custom element)
Required Elements (oerhub 1.3 and above)
- Learning Resource Type - lom.educational.learningresourcetype
- Language - lom.general.language
Preferred optional Elements
- Descripton - lom.general.description
Optional Elements
- Thumbnail - lom.technical.thumbnail
Custom LOM Elements
centity
centity is a customized element, which is introduced by edu-sharing for vcard support (used by UIBK). entity is the default one (used by TUGraz).
<lom:centity>
<lom:vcard>
BEGIN:VCARD VERSION:3.0 N:Mustermann;Max;;; FN:Max Mustermann TITLE:Ao. Univ. Prof. Dr. END:VCARD
</lom:vcard>
</lom:centity>
FN is the common used one in oerhub. It is recommended to use the optional FN in vCard version 2.1 to show similar output for all connected repositories
Special Attribues
xml:lang
For the element langstring the attribute xml:lang is used.
<lom:title>
<lom:langstring xml:lang="de">Wirtschaft integrativ verstehen</lom:langstring>
</lom:title>
<lom:description>
<lom:langstring xml:lang="de">Einführungsvortrag zum Masterstudiengang Accounting, Auditing and Taxation</lom:langstring>
</lom:description>
Hint: The repository has to ensure that the language code corresponds to the text of the element to pass the accessibility check!
x-none
"x-none" is used for strings without language identification (controlled vocabular).
<lom:value>
<lom:langstring xml:lang="x-none">Author</lom:langstring>
</lom:value>
FQDN
The Fully Qualified Domain Name FQDN is used as identifier for the catalog
identifier in header-Element
<header>
<identifier>oai:$FQDN:$ObjID</identifier>
<datestamp>2023-02-10T13:34:53Z</datestamp>
</header>
Example:
<header>
<identifier>oai:gecko.aau.at:452</identifier>
<datestamp>2023-02-10T13:34:53Z</datestamp>
</header>
catalog
<lom:general>
<lom:identifier>
<lom:catalog>$FQDN</lom:catalog>
<lom:entry>
<lom:langstring xml:lang="x-none">$ObjID</lom:langstring>
</lom:entry>
</lom:identifier>
...
</lom:general>
Example:
<lom:general>
<lom:identifier>
<lom:catalog>gecko.aau.at</lom:catalog>
<lom:entry>
<lom:langstring xml:lang="x-none">452</lom:langstring>
</lom:entry>
</lom:identifier>
...
</lom:general>
Vocabs
Creativ Commons
<lom:description>
<lom:langstring xml:lang="x-t-cc-url">https://creativecommons.org/licenses/by-sa/4.0</lom:langstring>
</lom:description>
This uri is expected https://creativecommons.org/licenses/by-sa/4.0 and following the link shows the details of the license.
The ingester expects https://creativecommons.org/licenses/ as an example for the RegExp
$linkCC =~ /^https\:\/\/creativecommons.org\/licenses\/(.*)/ )
There is no short CC in the link. It is added by the ingester.
ÖFOS
The expected part of the source uri ends with /vocabs/oefos2012
if ( $taxonPath->{source} =~ /(.*)\/vocabs\/oefos2012$/ ) {
To get the Oefos ID the uri has /vocabs/oefos2012 and the id at the end
if ( ($prefixurl, $number) = $id =~ /(.*)\/vocabs\/oefos2012\/(.*)/ ) {
Media Typ
The mapping used:
mime/type | Vocab |
---|---|
audio/mp3 | Audio |
application/pdf | Document |
image/png | Picture |
video/mp4 | Video |
Miscellaneous |
Changes in oerhub 1.3:
- no Media Type mapping, use of mime/type as is
- open for spdx.org/license (except commercial one)
- All Vocabs are archived in the index (spdx, oefos2012, iso639, kim/hcrt)
spdx (Licenses)
oerhub 1.3 and above we introduce Software Package Data Exchange (spdx).
<lom:description>
<lom:langstring xml:lang="x-t-cc-url">https://spdx.org/licenses/Apache-2.0</lom:langstring>
</lom:description>
expected URI:
https://spdx.org/licenses/<Identifier>
.<Identifier>
is the Short Identifier. The link shows the remarks of the license.
ISO Language codes and translation (Languages)
The data for the language codes and translation is based on https://www.loc.gov/standards/iso639-2/php/code_list.php.
kim/hcrt (Learning Resource Type)
Our controlled vocabs are based from https://skohub.io/dini-ag-kim/hcrt/heads/master/w3id.org/kim/hcrt/scheme.en.html to define the Learning Resource Type.
<lom:educational>
<lom:learningresourcetype>
<lom:source>
<lom:langstring xml:lang="x-none">https://w3id.org/kim/hcrt/scheme</lom:langstring>
</lom:source>
<lom:id>https://w3id.org/kim/hcrt/video</lom:id>
<lom:entry>...
</lom:entry>
</lom:learningresourcetype>
</lom:educational>
expected URI:
https://w3id.org/kim/hcrt/<Identifier>
.<Identifier>
is the identifier like video in the Example https://w3id.org/kim/hcrt/video
Links
- https://www.openarchives.org/
- https://metacpan.org/pod/Net::OAI::Harvester
- https://validator.oaipmh.com/
- https://en.wikipedia.org/wiki/Dublin_Core
- https://en.wikipedia.org/wiki/Learning_object_metadata
- https://standards.ieee.org/standard/1484_12_3-2020.html
- https://oer-repo.uibk.ac.at/lom/latest/
- https://www.data.gv.at/katalog/dataset/stat_ofos-2012
- https://www.doi.org/
- https://creativecommons.org/
- https://spdx.org/licenses/
- https://www.loc.gov/standards/iso639-2/php/code_list.php
Repos with OAI-PMH/LOM
- Invenio
- edu-sharing
- Phaidra
- Other