Skip to content
Snippets Groups Projects
Verified Commit e4f762a6 authored by Martin Weise's avatar Martin Weise
Browse files

Added stuff

parent b38d4b1f
Branches
No related tags found
No related merge requests found
Showing
with 1693 additions and 1528 deletions
......@@ -2,10 +2,13 @@
author: Martin Weise
---
# Deploy DBRepo with Docker Compose
# Docker Compose
## TL;DR
If you have [:simple-docker: Docker](https://docs.docker.com/engine/install/) already installed on your system, you can
install DBRepo with:
```shell
curl -sSL https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/raw/dev/install.sh | sudo bash
```
......@@ -21,4 +24,112 @@ technologies. The conceptualized microservices operate the basic database operat
<figcaption>Architecture of the services deployed via Docker Compose</figcaption>
</figure>
Alternatively, you can also deploy DBRepo with [minikube](../deployment-kubernetes-minikube/) in your virtual machine instead.
Alternatively, you can also deploy DBRepo with [Helm](../deployment-helm/) in your virtual machine instead.
## Requirements
### Hardware
For this small, local, test deployment any modern hardware would suffice, we recommend a dedicated virtual machine with
the following settings. Note that most of the vCPU and RAM resources will be needed for starting the infrastructure,
this is because of Docker. During idle times, the deployment will use significantly less resources.
- 4 vCPU cores
- 16GB RAM memory
- 100GB SSD storage
### Software
Install Docker Engine for your operating system. There are excellent guides available for Linux, we highly recommend
to use a stable distribution such as [:simple-debian: Debian](https://www.debian.org/download). In the following guide
we only consider Debian.
## Deployment
We maintain a rapid prototype deployment option through Docker Compose (v2.17.0 and newer). This deployment creates the
core infrastructure and a single Docker container for all user-generated databases.
curl -sSL https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/raw/dev/install.sh | sudo bash
View the logs:
docker compose logs -f
You should now be able to view the front end at [http://localhost:80](http://localhost:80).
Please be warned that the default configuration is not intended for public deployments. It is only intended to have a
running system within minutes to play around within the system and explore features. It is strongly advised to change
the default `.env` environment variables.
!!! warning "Known security issues with the default configuration"
The system is auto-configured for a small, local, test deployment and is *not* secure! You need to make modifications
in various places to make it secure:
* **Authentication Service**:
a. You need to use your own instance or configure a secure instance using a (self-signed) certificate.
Additionally, when serving from a non-default Authentication Service, you need to put it into the
`JWT_ISSUER` environment variable (`.env`).
b. You need to change the default admin user `fda` password in Realm
master > Users > fda > Credentials > Reset password.
c. You need to change the client secrets for the clients `dbrepo-client` and `broker-client`. Do this in Realm
dbrepo > Clients > dbrepo-client > Credentials > Client secret > Regenerate. Do the same for the
broker-client.
d. You need to regenerate the public key of the `RS256` algorithm which is shared with all services to verify
the signature of JWT tokens. Add your securely generated private key in Realm
dbrepo > Realm settings > Keys > Providers > Add provider > rsa.
* **Broker Service**: by default, this service is configured with an administrative user that has major privileges.
You need to change the password of the user *fda* in Admin > Update this user > Password. We found this
[simple guide](https://onlinehelp.coveo.com/en/ces/7.0/administrator/changing_the_rabbitmq_administrator_password.htm)
to be very useful.
* **Search Database**: by default, this service is configured to require authentication with an administrative user
that is allowed to write into the indizes. Following
this [simple guide](https://www.elastic.co/guide/en/elasticsearch/reference/8.7/reset-password.html), this can be
achieved using the command line.
* **Gateway Service**: by default, no HTTPS is used that protects the services behind. You need to provide a trusted
SSL/TLS certificate in the configuration file or use your own proxy in front of the Gateway Service. See this
[simple guide](http://nginx.org/en/docs/http/configuring_https_servers.html) on how to install a SSL/TLS
certificate on NGINX.
## Upgrade Guide
### 1.2 to 1.3
In case you have a previous deployment from version 1.2, shut down the containers and back them up manually. You can do
this by using the `busybox` image. Replace `deadbeef` with your container name or hash:
```console
export NAME=dbrepo-userdb-xyz
docker run --rm --volumes-from $NAME -v /home/$USER/backup:/backup busybox tar pcvfz /backup/$NAME.tar.gz /var/lib/mysql
```
!!! danger "Wipe all traces of DBRepo from your system"
To erase all traces of DBRepo from your computer or virtual machine, the following commands delete all containers,
volumes and networks that are present, execute the following **dangerous** command. It will **wipe** all information
about DBRepo from your system (excluding the images).
```console
docker container stop $(docker container ls -aq -f name=^/dbrepo-.*) || true
docker container rm $(docker container ls -aq -f name=^/dbrepo-.*) || true
docker volume rm $(docker volume ls -q -f name=^dbrepo-.*) || true
docker network rm $(docker network ls -q -f name=^dbrepo-.*) || true
```
You can restore the volume *after* downloading the new 1.3 images and creating the infrastructure:
```console
export NAME=dbrepo-userdb-xyz
export PORT=12345
docker container create -h $NAME --name $NAME -p $PORT:3306 -e MARIADB_ROOT_PASSWORD=mariadb --network userdb -v /backup mariadb:10.5
docker run --rm --volumes-from $NAME -v /home/$USER/backup/.tar.gz:/backup/$NAME.tar.gz busybox sh -c 'cd /backup && tar xvfz /backup/$NAME.tar.gz && cp -r /backup/var/lib/mysql/* /var/lib/mysql'
```
Future releases will be backwards compatible and will come with migration scripts.
---
author: Martin Weise
---
## TL;DR
To install DBRepo in your existing cluster, download the sample [`values.yaml`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-deployment/-/raw/master/charts/dbrepo-core/values.yaml?inline=false)
for your deployment and update the variables `hostname` and `authAdminApiUrl` to your domain.
If you have [:simple-helm: Helm](https://helm.sh/docs/intro/install/) already installed on your system, you can
install DBRepo with:
```shell
helm upgrade --install dbrepo \
-n dbrepo \
"oci://dbrepo.azurecr.io/helm/dbrepo-core" \
--values ./values.yaml \
--version "0.1.3" \
--create-namespace \
--cleanup-on-fail
```
## Architecture
<figure markdown>
![Architecture Kubernetes Azure](images/architecture-core.svg)
<figcaption>Architecture of the services on Kubernetes</figcaption>
</figure>
## Chart values
| Key | Type | Default | Description |
|---------------------------------|--------|------------------------------------|------------------------------------------------------------------------------------------------------------|
| `replicaCount` | int | `1` | Number of replicas (pods) to launch. |
| `nameOverride` | string | `""` | A name in place of the chart name for `app:` labels. |
| `fullnameOverride` | string | `""` | A name to substitute for the full names of resources. |
| `adminEmail` | string | `noreply@example.com` | E-mail address for OAI-PMH metadata. |
| `repositoryName` | string | `Database Repository` | Repository name for OAI-PMH metadata. |
| `hostname` | string | `example.com` | Domain name for the deployment, should not contain `https://` or any path. |
| `uiLogo` | string | `/logo.png` | Path to the logo, you can mount the file via a configmap or volume. |
| `uiIcon` | string | `/favicon.ico` | Path to the favicon, you can mount the file via a configmap or volume. |
| `uiVersion` | string | `latest` | Subtitle of the repository displayed in the UI. |
| `uiTitle` | string | `Database Repository` | Title of the repository displayed in the UI. |
| `uiKeycloakLoginUrl` | string | `/api/auth/` | Link to the authentication service login page. |
| `uiBrokerLoginUrl` | string | `/broker/` | Link to the broker service login page. |
| `uiForceSsl` | bool | `true` | Force SSL in the frontend on all resources and links. Disable this for insecure file uploads. |
| `uiUploadPath` | string | `/tmp/` | Path to upload files into the shared volume. |
| `authClientId` | string | `dbrepo-client` | Id of the client within keycloak that the backend services should use for communication with keycloak. |
| `authClientSecret` | string | `MUwRc7yfXSJwX8AdRMWaQC3Nep1VjwgG` | Secret of this client. This should be changed. |
| `authUsername` | string | `fda` | Authentication service admin username that the backend services should use. |
| `authPassword` | string | `fda` | Authentication service admin password that the backend services should use. |
| `authAdminApiUrl` | string | `https://example.com/api/auth` | Backend authentication URL that points to the keycloak instance. |
| `brokerUsername` | string | `broker` | Broker service admin username that the backend services should use. |
| `brokerPassword` | string | `broker` | Broker service admin password that the backend services should use. |
| `brokerEndpoint` | string | `http://broker-service` | Endpoint URL of the broker service. |
| `datacitePassword` | string | `""` | Password of a DataCite Fabrica user to mint DOIs (optional). |
| `datacitePrefix` | string | `""` | DOI prefix (optional). |
| `dataciteUrl` | string | `https://api.datacite.org` | DataCite Fabrica API endpoint URL (optional). |
| `dataciteUsername` | string | `""` | Username of a DataCite Fabrica user to mint DOIs (optional). |
| `metadataDbDatabase` | string | `fda` | Database name of the metadata database. |
| `metadataDbHost` | string | `metadata-db` | Hostname of the metadata database, this can be a domain name for e.g. managed database deployments. |
| `metadataDbJdbcExtraArgs` | string | `""` | Additional arguments for the JDBC protocol to e.g. enforce SSL with `?useSSL=true` |
| `metadataDbPassword` | string | `dbrepo` | Password of the root user that can access the metadata database. |
| `metadataDbUsername` | string | `root` | Username of the root user that can access the metadata database. |
| `metadataDbReplicationUsername` | string | `replicator` | Replication username. Set to `""` if no replication pod should be started (e.g. in a managed environment). |
| `metadataDbReplicationPassword` | string | `replicator` | Replication password. Set to `""` if no replication pod should be started (e.g. in a managed environment). |
| `authDb` | string | `keycloak` | Database name of the authentication service database. |
| `authDbHost` | string | `auth-db` | Hostname of the metadata database, this can be a domain name for e.g. managed database deployments. |
| `authDbType` | string | `mariadb` | JDBC database type for the authentication service (keycloak). |
| `authDbPassword` | string | `dbrepo` | Password of the root user that can access the authentication database. |
| `authDbUsername` | string | `root` | Username of the root user that can access the authentication database. |
| `authDbReplicationUsername` | string | `replicator` | Replication username. Set to `""` if no replication pod should be started (e.g. in a managed environment). |
| `authDbReplicationPassword` | string | `replicator` | Replication password. Set to `""` if no replication pod should be started (e.g. in a managed environment). |
| `dataDbPassword` | string | `dbrepo` | Password of the root user that can access the data database. |
| `dataDbUsername` | string | `root` | Username of the root user that can access the data database. |
| `dataDbReplicationUsername` | string | `replicator` | Replication username. Set to `""` if no replication pod should be started (e.g. in a managed environment). |
| `dataDbReplicationPassword` | string | `replicator` | Replication password. Set to `""` if no replication pod should be started (e.g. in a managed environment). |
| `searchPassword` | string | `admin` | Password of the user that can read and write into the search database. |
| `searchUsername` | string | `admin` | Username of the user that can read and write into the search database. |
| `additionalConfigMaps` | string | `[]` | Array of additional config maps. Set to e.g. `[ name: my-config, data: [ key: value ] ]`. |
| `additionalSecrets` | string | `[]` | Array of additional secrets. Set to e.g. `[ name: my-secret, data: [ key: b64_enc_value ] ]`. |
| `premiumStorageClassName` | string | `""` | StorageClass name for the shared volume. Must have `ReadWriteMany` capabilities. |
......@@ -2,34 +2,48 @@
author: Martin Weise
---
# Deploy DBRepo with Azure
# Special Instructions for Azure Cloud
You can use our pre-built Helm chart for deploying DBRepo in your Kubernetes Cluster
with [Azure](https://azure.microsoft.com/) as infrastructure provider. We recommend at least a *Standard_B4ms* node size
(Kubernetes version 1.24.10, 2 node pools) Kubernetes cluster, suitable for small operational deployments.
with Microsoft Azure as infrastructure provider.
## TL;DR
## Requirements
```shell
helm install dbrepo "oci://dbrepo.azurecr.io/helm/dbrepo-azure" \
--version "0.1.3"
```
### Hardware
For this small cloud, test deployment any public cloud provider would suffice, we recommend a
small [:simple-microsoftazure: Azure Kubernetes Service](https://azure.microsoft.com/en-us/products/kubernetes-service)
with Kubernetes version *1.24.10* and node sizes *Standard_B4ms*
- 4 vCPU cores
- 16GB RAM memory
- 100GB SSD storage
This is roughly met by selecting the *Standard_B4ms* flavor.
## Architecture
## Deployment
This section will be extended in the future.
### Databases
<figure markdown>
![Architecture Kubernetes Azure](images/architecture-kubernetes.svg)
<figcaption>Architecture of the services on Azure Kubernetes</figcaption>
</figure>
Since Azure offers a managed [Azure Database for MariaDB](https://azure.microsoft.com/en-us/products/mariadb), we
recommend to at least deploy the Metadata Database as high-available, managed database.
### Chart values
!!! warning "End of Life software"
This section will be extended in the future.
Unfortunately, Azure does not (yet) support managed MariaDB 10.5, the latest version supported by Azure is 10.3
which is End of Life (EOL) from [May 2023 onwards](https://mariadb.com/kb/en/changes-improvements-in-mariadb-10-3/).
Microsoft decided to still maintain MariaDB 10.3
until [September 2025](https://learn.microsoft.com/en-us/azure/mariadb/concepts-supported-versions).
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| replicaCount | int | `1` | Number of replicas (pods) to launch. |
| nameOverride | string | `""` | A name in place of the chart name for `app:` labels. |
| fullnameOverride | string | `""` | A name to substitute for the full names of resources. |
### Shared Volume
For the shared volume PersistentVolumeClaim `dbrepo-shared-volume-claim`, select an appropriate StorageClass that
supports `ReadWriteMany` access modes and modify the `premiumStorageClassName` variable accordingly.
It is sufficient, to select the cost-efficient `azurefile` StorageClass for Azure:
```yaml title="values.yaml"
...
premiumStorageClassName: azurefile
...
```
......@@ -2,38 +2,106 @@
author: Martin Weise
---
# Deploy DBRepo with minikube
# Special Instructions for Minikube
You can use our pre-built Helm chart for deploying DBRepo in your Kubernetes Cluster
with [minikube](https://minikube.sigs.k8s.io/docs/start/) as infrastructure provider which deploys a single-node
Kubernetes cluster on your machine, suitable for test-deployments.
You can use our Helm chart for deploying DBRepo in your Kubernetes Cluster
using [minikube](https://minikube.sigs.k8s.io/docs/start/) as infrastructure provider which deploys a single-node Kubernetes cluster on your machine,
suitable for test-deployments.
!!! info "Rootless Docker"
## Requirements
We use a minikube installation with rootless Docker driver in this section.
### Hardware
## TL;DR
For this small, local, test deployment any modern hardware would suffice, we recommend a dedicated virtual machine with
the following settings. Note that most of the vCPU and RAM resources will be needed for starting the infrastructure,
this is because of Docker. During idle times, the deployment will use significantly less resources.
- 4 vCPU cores
- 16GB RAM memory
- 100GB SSD storage
### Software
First, install the minikube virtualization tool that provides a single-node Kubernetes environment, e.g. on a virtual
machine. We do not regularly check these instructions, they are provided on best-effort. Check
the [official documentation](https://minikube.sigs.k8s.io/docs/start/) for up-to-date information.
For Debian:
```shell
helm install dbrepo "oci://dbrepo.azurecr.io/helm/dbrepo-core" \
--version "0.1.3"
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.deb
sudo dpkg -i minikube_latest_amd64.deb
```
## Architecture
Start the cluster and enable basic plugins:
This section will be extended in the future.
```shell
minikube start --driver='docker'
minikube kubectl -- get po -A
minikube addons enable ingress
```
<figure markdown>
![Architecture Kubernetes OpenStack](images/architecture-kubernetes.svg)
<figcaption>Architecture of the services on OpenStack Kubernetes</figcaption>
</figure>
Deploy a NGINX reverse proxy on the virtual machine to reach your minikube cluster from the public Internet:
### Chart values
```nginx title="/etc/nginx/conf.d/dbrepo.conf"
resolver 127.0.0.11 valid=30s;
server {
listen 80;
server_name _;
location / {
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_pass http://CLUSTER_IP;
}
}
server {
listen 443 ssl;
server_name DOMAIN_NAME;
ssl_certificate /etc/nginx/certificate.crt;
ssl_certificate_key /etc/nginx/certificate.key;
location / {
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_pass https://CLUSTER_IP;
}
}
```
This section will be extended in the future.
Replace `CLUSTER_IP` with the result of:
| Key | Type | Default | Description |
|------------------|--------|---------|-------------------------------------------------------|
| replicaCount | int | `1` | Number of replicas (pods) to launch. |
| nameOverride | string | `""` | A name in place of the chart name for `app:` labels. |
| fullnameOverride | string | `""` | A name to substitute for the full names of resources. |
$ minikube ip
192.168.49.2
Replace `DOMAIN_NAME` with the domain name. You will need also a valid TLS certificate with private key for TLS enabled
in the cluster. In our test deployment we obtained a certificate from Let's Encrypt.
## Deployment
To install the DBRepo Helm Chart, download and edit
the [`values.yaml`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-deployment/-/raw/master/charts/dbrepo-minikube/values.yaml?inline=false)
file. At minimum you need to change the values for:
* `hostname`, set to your domain, e.g. `subdomain.example.com`
* `authAdminApiUrl`, similarly but with https and the api to the keycloak server, e.g. `https://subdomain.example.com/api/auth`
It is advised to also change the usernames and passwords for all credentials. Next, install the chart using your edited
`values.yaml` file:
!!! info "Documentation of values.yaml"
We documented all values in the `values.yaml` file [here](http://127.0.0.1:8000/deployment-helm/#chart-values) with
default values and description for each value.
```shell
helm upgrade --install dbrepo \
-n dbrepo \
"oci://dbrepo.azurecr.io/helm/dbrepo-core" \
--values ./values.yaml \
--version "0.1.3" \
--create-namespace \
--cleanup-on-fail
```
---
author: Martin Weise
---
# Get Started
!!! abstract "Abstract"
In this short getting started guide we show the dependencies to run the database repository and perform a small,
local, test deployment for quickly trying out the features that the repository offers.
## Requirements
### Hardware
For this small, local, test deployment any modern hardware would suffice, we recommend a dedicated virtual machine with
the following settings. Note that most of the CPU and RAM resources will be needed for starting the infrastructure,
this is because of Docker.
- 8 CPU cores
- 16GB RAM memory
- 100GB SSD memory available
### Software
Install Docker Engine for your operating system. There are excellent guides available for Linux, we highly recommend
to use a stable distribution such as [Debian](https://docs.docker.com/desktop/install/debian/). In the following guide
we only consider Debian.
## Deployment
### Docker Compose
We maintain a rapid prototype deployment option through Docker Compose (v2.17.0 and newer). This deployment creates the
core infrastructure and a single Docker container for all user-generated databases.
curl -sSL https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/raw/dev/install.sh | sudo bash
View the logs:
docker compose logs -f
You should now be able to view the front end at <a href="http://127.0.0.1:80" target="_blank">http://127.0.0.1:80</a>
Please be warned that the default configuration is not intended for public deployments. It is only intended to have a
running system within minutes to play around within the system and explore features.
!!! warning "Known security issues with the default configuration"
The system is auto-configured for a small, local, test deployment and is *not* secure! You need to make modifications
in various places to make it secure:
* **Authentication Service**:
a. You need to use your own instance or configure a secure instance using a (self-signed) certificate.
Additionally, when serving from a non-default Authentication Service, you need to put it into the
`JWT_ISSUER` environment variable (`.env`).
b. You need to change the default admin user `fda` password in Realm
master > Users > fda > Credentials > Reset password.
c. You need to change the client secrets for the clients `dbrepo-client` and `broker-client`. Do this in Realm
dbrepo > Clients > dbrepo-client > Credentials > Client secret > Regenerate. Do the same for the
broker-client.
d. You need to regenerate the public key of the `RS256` algorithm which is shared with all services to verify
the signature of JWT tokens. Add your securely generated private key in Realm
dbrepo > Realm settings > Keys > Providers > Add provider > rsa.
* **Broker Service**: by default, this service is configured with an administrative user that has major privileges.
You need to change the password of the user *fda* in Admin > Update this user > Password. We found this
[simple guide](https://onlinehelp.coveo.com/en/ces/7.0/administrator/changing_the_rabbitmq_administrator_password.htm)
to be very useful.
* **Search Database**: by default, this service is configured to require authentication with an administrative user
that is allowed to write into the indizes. Following
this [simple guide](https://www.elastic.co/guide/en/elasticsearch/reference/8.7/reset-password.html), this can be
achieved using the command line.
* **Gateway Service**: by default, no HTTPS is used that protects the services behind. You need to provide a trusted
SSL/TLS certificate in the configuration file or use your own proxy in front of the Gateway Service. See this
[simple guide](http://nginx.org/en/docs/http/configuring_https_servers.html) on how to install a SSL/TLS
certificate on NGINX.
##### Migration from 1.2 to 1.3
In case you have a previous deployment from version 1.2, shut down the containers and back them up manually. You can do
this by using the `busybox` image. Replace `deadbeef` with your container name or hash:
```console
export NAME=dbrepo-userdb-xyz
docker run --rm --volumes-from $NAME -v /home/$USER/backup:/backup busybox tar pcvfz /backup/$NAME.tar.gz /var/lib/mysql
```
!!! danger "Wipe all traces of DBRepo from your system"
To erase all traces of DBRepo from your computer or virtual machine, the following commands delete all containers,
volumes and networks that are present, execute the following **dangerous** command. It will **wipe** all information
about DBRepo from your system (excluding the images).
```console
docker container stop $(docker container ls -aq -f name=^/dbrepo-.*) || true
docker container rm $(docker container ls -aq -f name=^/dbrepo-.*) || true
docker volume rm $(docker volume ls -q -f name=^dbrepo-.*) || true
docker network rm $(docker network ls -q -f name=^dbrepo-.*) || true
```
You can restore the volume *after* downloading the new 1.3 images and creating the infrastructure:
```console
export NAME=dbrepo-userdb-xyz
export PORT=12345
docker container create -h $NAME --name $NAME -p $PORT:3306 -e MARIADB_ROOT_PASSWORD=mariadb --network userdb -v /backup mariadb:10.5
docker run --rm --volumes-from $NAME -v /home/$USER/backup/.tar.gz:/backup/$NAME.tar.gz busybox sh -c 'cd /backup && tar xvfz /backup/$NAME.tar.gz && cp -r /backup/var/lib/mysql/* /var/lib/mysql'
```
Future releases will be backwards compatible and will come with migration scripts.
### Kubernetes
We maintain a RKE2 Kubernetes deployment from version 1.3 onwards. More on that when the release date is fixed.
......@@ -14,7 +14,7 @@ collection. Challenges revolve around organizing, searching and retrieving conte
constitute a major technical burden as their internal representation greatly differs from static documents most digital
repositories are designed for.
[Get Started](/infrastructures/dbrepo/get-started){ .action-button .md-button .md-button--primary }
[Get Started](/infrastructures/dbrepo/deployment-docker-compose/){ .action-button .md-button .md-button--primary }
[Learn More](/infrastructures/dbrepo/system){ .action-button .md-button .md-button--secondary }
## Application Areas
......
---
author: Martin Weise
---
# Data Database
!!! debug "Debug Information"
* Ports: 3306/tcp, 9100/tcp
* Prometheus: `http://:9100/metrics`
It is the core component of the project. It is a relational database that contains metadata about all researcher
databases
created in the database repository like column names, check expressions, value enumerations or key/value constraints and
relevant data for citing data sets. Additionally, the concept, e.g. URI of units of measurements of numerical columns is
stored in the Metadata Database in order to provide semantic knowledge context. We use MariaDB for its rich capabilities
in the reference implementation.
The default credentials are `root:dbrepo` for the database `fda`. Connect to the database via the JDBC connector on
port `3306`.
---
author: Martin Weise
---
# Metadata Database
!!! debug "Debug Information"
* Ports: 3306/tcp, 9100/tcp
* Prometheus: `http://:9100/metrics`
It is the core component of the project. It is a relational database that contains metadata about all researcher
databases
created in the database repository like column names, check expressions, value enumerations or key/value constraints and
relevant data for citing data sets. Additionally, the concept, e.g. URI of units of measurements of numerical columns is
stored in the Metadata Database in order to provide semantic knowledge context. We use MariaDB for its rich capabilities
in the reference implementation.
The default credentials are `root:dbrepo` for the database `fda`. Connect to the database via the JDBC connector on
port `3306`.
---
author: Martin Weise
---
# Search Database
!!! debug "Debug Information"
* Ports: 9200/tcp
* Indexes: `http://:9200/_all`
* Health: `http://:9200/_cluster/health/`
It processes search requests from the Gateway Service for full-text lookups in the metadata database. We use
[Elasticsearch](https://www.elastic.co/) in the reference implementation. The search database implements Elastic Search
and creates a retrievable index on all databases that is getting updated with each save operation on databases in the
metadata database.
All requests need to be authenticated, by default the credentials `elastic:elastic` are used.
---
author: Martin Weise
---
# UI
!!! debug "Debug Information"
* Ports: 3000/tcp, 9100/tcp
* Prometheus: `http://:9100/metrics`
* UI: `http://:3000/`
It provides a *graphical user interface* (GUI) for a researcher to interact with the database repository's API.
<figure markdown>
![UI microservice architecture detailed](images/architecture-ui.png)
<figcaption>Architecture of the UI microservice</figcaption>
</figure>
---
author: Martin Weise
---
# Analyse Service
!!! debug "Debug Information"
* Ports: 5000/tcp
* Prometheus: `http://:5000/metrics`
* Swagger UI: `http://:5000/swagger-ui/index.html` <a href="/infrastructures/dbrepo/latest/swagger/analyse" target="_blank">:fontawesome-solid-square-up-right: view online</a>
It suggests data types for the FAIR Portal when creating a table from a *comma separated values* (CSV) file. It
recommends enumerations for columns and returns e.g. a list of potential primary key candidates. The researcher is able
to confirm these suggestions manually. Moreover, the *Analyze Service* determines basic statistical properties of
numerical columns.
---
author: Martin Weise
---
# Authentication Service
!!! debug "Debug Information"
* Ports: 8080/tcp, 8443/tcp
* Admin Console: `http://:8443/`
Very specific to the deployment of the organization. In our reference implementation we implement a *security assertion
markup language* (SAML) service provider and use our institutional SAML identity provider for obtaining account data
through an encrypted channel.
From version 1.2 onwards we use Keycloak for authentication and deprecated the previous Spring Boot application.
Consequently,
the authentication will be through Keycloak.
!!! warning "Unsupported Keycloak features"
Due to no demand at the time, we currently do not support the following Keycloak features:
* E-Mail verification
* Temporary passwords
By default, the Authentication Service comes with a self-signed certificate valid 3 months from build date. For
deployment it is *highly encouraged* to use your own certificate, properly issued by a trusted PKI, e.g. G&#201;ANT. For
local deployments you can use the self-signed certificate. You need to accept the risk in most browsers when visiting
the [admin panel](https://localhost:8443/admin/).
Sign in with the default credentials (username `fda`, password `fda`) or the one you configured during set-up. Be
default, users are created using the frontend and the sign-up page. But it is also possible to create users from
Keycloak, they will still act as "self-sign-up" created users. Since we do not support all features of Keycloak, leave
out required user actions as they will not be enforced, also the temporary password.
Each user has attributes associated to them. In case you manually create a user in Keycloak directly, you need to add
them in Users > Add user > Attributes:
* `theme_dark` (*boolean*, default: false)
* `orcid` (*string*)
* `affiliation` (*string*)
## Groups
The authorization scheme follows a group-based access control (GBAC). Users are organized in three distinct
(non-overlapping) groups:
1. Researchers (*default*)
2. Developers
3. Data Stewards
Based on the membership in one of these groups, the user is assigned a set of roles that authorize specific actions. By
default, all users are assigned to the `researchers` group.
## Roles
We organize the roles into default- and escalated composite roles. There are three composite roles, one for each group.
Each of the composite role has a set of other associated composite roles.
<figure markdown>
![](images/groups-roles.png)
<figcaption>Three groups (Researchers, Developers, Data Stewards) and their composite roles associated.</figcaption>
</figure>
There is one role for one specific action in the services. For example: the `create-database` role authorizes a user to
create a database in a Docker container. Therefore,
the [`DatabaseEndpoint.java`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/blob/a5bdd1e2169bae6497e2f7eee82dad8b9b059850/fda-database-service/rest-service/src/main/java/at/tuwien/endpoints/DatabaseEndpoint.java#L78)
endpoint requires a JWT access token with this authority.
```java
@PostMapping
@PreAuthorize("hasAuthority('create-database')")
public ResponseEntity<DatabaseBriefDto> create(@NotNull Long containerId,
@Valid @RequestBody DatabaseCreateDto createDto,
@NotNull Principal principal) {
...
}
```
### Default Container Handling
| Name | Description |
|--------------------------|--------------------------------------|
| `create-container` | Can create a container |
| `find-container` | Can find a specific container |
| `list-containers` | Can list all containers |
| `modify-container-state` | Can start and stop the own container |
### Default Database Handling
| Name | Description |
|------------------------------|------------------------------------------------------|
| `check-database-access` | Can check the access to a database of a user |
| `create-database` | Can create a database |
| `create-database-access` | Can give a new access to a database of a user |
| `delete-database-access` | Can delete the access to a database of a user |
| `find-database` | Can find a specific database in a container |
| `list-databases` | Can list all databases in a container |
| `modify-database-visibility` | Can modify the database visibility (public, private) |
| `modify-database-owner` | Can modify the database owner |
| `update-database-access` | Can update the access to a database of a user |
### Default Table Handling
| Name | Description |
|---------------------------------|------------------------------------------------------|
| `create-table` | Can create a table |
| `find-tables` | Can list a specific table in a database |
| `list-tables` | Can list all tables |
| `modify-table-column-semantics` | Can modify the column semantics of a specific column |
### Default Query Handling
| Name | Description |
|---------------------------|-----------------------------------------------|
| `create-database-view` | Can create a view in a database |
| `delete-database-view` | Can delete a view in a database |
| `delete-table-data` | Can delete data in a table |
| `execute-query` | Can execute a query statement |
| `export-query-data` | Can export the data that a query has produced |
| `export-table-data` | Can export the data stored in a table |
| `find-database-view` | Can find a specific database view |
| `find-query` | Can find a specific query in the query store |
| `insert-table-data` | Can insert data into a table |
| `list-database-views` | Can list all database views |
| `list-queries` | Can list all queries in the query store |
| `persist-query` | Can persist a query in the query store |
| `re-execute-query` | Can re-execute a query to reproduce a result |
| `view-database-view-data` | Can view the data produced by a database view |
| `view-table-data` | Can view the data in a table |
| `view-table-history` | Can view the data history of a table |
### Default Identifier Handling
| Name | Description |
|---------------------|---------------------------------------------|
| `create-identifier` | Can create an identifier (subset, database) |
| `find-identifier` | Can find a specific identifier |
| `list-identifier` | Can list all identifiers |
### Default User Handling
| Name | Description |
|---------------------------|-----------------------------------------|
| `modify-user-theme` | Can modify the user theme (light, dark) |
| `modify-user-information` | Can modify the user information |
### Default Maintenance Handling
| Name | Description |
|------------------------------|------------------------------------------|
| `create-maintenance-message` | Can create a maintenance message banner |
| `delete-maintenance-message` | Can delete a maintenance message banner |
| `find-maintenance-message` | Can find a maintenance message banner |
| `list-maintenance-messages` | Can list all maintenance message banners |
| `update-maintenance-message` | Can update a maintenance message banner |
### Default Semantics Handling
| Name | Description |
|---------------------------|-----------------------------------------------------------------|
| `create-semantic-unit` | Can save a previously unknown unit for a table column |
| `create-semantic-concept` | Can save a previously unknown concept for a table column |
| `execute-semantic-query` | Can query remote SPARQL endpoints to get labels and description |
| `table-semantic-analyse` | Can automatically suggest units and concepts for a table |
### Escalated User Handling
| Name | Description |
|-------------|-----------------------------------------------|
| `find-user` | Can list user information for a specific user |
### Escalated Container Handling
| Name | Description |
|----------------------------------|----------------------------------------------|
| `delete-container` | Can delete any container |
| `modify-foreign-container-state` | Can modify any container state (start, stop) |
### Escalated Database Handling
| Name | Description |
|-------------------|------------------------------------------|
| `delete-database` | Can delete any database in any container |
### Escalated Table Handling
| Name | Description |
|----------------|--------------------------------------|
| `delete-table` | Can delete any table in any database |
### Escalated Query Handling
| Name | Description |
|------|-------------|
| / | |
### Escalated Identifier Handling
| Name | Description |
|------------------------------|---------------------------------------------------|
| `create-foreign-identifier` | Can create an identifier to any database or query |
| `delete-identifier` | Can delete any identifier |
| `modify-identifier-metadata` | Can modify any identifier metadata |
### Escalated Semantics Handling
| Name | Description |
|-----------------------------------------|----------------------------------------------|
| `create-ontology` | Can register a new ontology |
| `delete-ontology` | Can unregister an ontology |
| `list-ontologies` | Can list all ontologies |
| `modify-foreign-table-column-semantics` | Can modify any table column concept and unit |
| `update-ontology` | Can update ontology metadata |
| `update-semantic-concept` | Can update own table column concept |
| `update-semantic-unit` | Can update own table column unit |
## API
### Obtain Access Token
Access tokens are needed for almost all operations.
=== "Terminal"
``` console
curl -X POST \
-d "username=foo&password=bar&grant_type=password&client_id=dbrepo-client&scope=openid&client_secret=MUwRc7yfXSJwX8AdRMWaQC3Nep1VjwgG" \
http://localhost/api/auth/realms/dbrepo/protocol/openid-connect/token
```
=== "Python"
``` py
import requests
auth = requests.post("http://localhost/api/auth/realms/dbrepo/protocol/openid-connect/token", data={
"username": "foo",
"password": "bar",
"grant_type": "password",
"client_id": "dbrepo-client",
"scope": "openid",
"client_secret": "MUwRc7yfXSJwX8AdRMWaQC3Nep1VjwgG"
})
print(auth.json()["access_token"])
```
### Refresh Access Token
Using the response from above, a new access token can be created via the refresh token provided.
=== "Terminal"
``` console
curl -X POST \
-d "grant_type=refresh_token&client_id=dbrepo-client&refresh_token=THE_REFRESH_TOKEN&client_secret=MUwRc7yfXSJwX8AdRMWaQC3Nep1VjwgG" \
http://localhost/api/auth/realms/dbrepo/protocol/openid-connect/token
```
=== "Python"
``` py
import requests
auth = requests.post("http://localhost/api/auth/realms/dbrepo/protocol/openid-connect/token", data={
"grant_type": "refresh_token",
"client_id": "dbrepo-client",
"client_secret": "MUwRc7yfXSJwX8AdRMWaQC3Nep1VjwgG",
"refresh_token": "THE_REFRESH_TOKEN"
})
print(auth.json()["access_token"])
```
---
author: Martin Weise
---
# Broker Service
!!! debug "Debug Information"
* Ports: 5672/tcp, 15672/tcp
* RabbitMQ Management Plugin: `http://:15672`
* RabbitMQ Prometheus Plugin: `http://:15692/metrics`
It holds exchanges and topics responsible for holding AMQP messages for later consumption. We
use [RabbitMQ](https://www.rabbitmq.com/) in the implementation. The AMQP endpoint listens to port `5672` for
regular declares and offers a management interface at port `15672`.
The default credentials are:
* Username: `fda`
* Password: `fda`
---
author: Martin Weise
---
# Gateway Service
!!! debug "Debug Information"
* Ports: 9095/tcp
* Info: `http://:9095/actuator/info`
* Health: `http://:9095/actuator/health`
* Prometheus: `http://:9095/actuator/prometheus`
Provides a single point of access to the *application programming interface* (API) and configures a
standard [NGINX](https://www.nginx.com/) reverse proxy for load balancing, SSL/TLS configuration.
---
author: Martin Weise
---
# Metadata Service
!!! debug "Debug Information"
* Ports: 9092/tcp
* Info: `http://:9092/actuator/info`
* Health: `http://:9092/actuator/health`
* Prometheus: `http://:9092/actuator/prometheus`
* Swagger UI: `http://:9092/swagger-ui/index.html` <a href="/infrastructures/dbrepo/latest/swagger/database" target="_blank">:fontawesome-solid-square-up-right: view online</a>
It creates the databases inside a Docker container and the Query Store. Currently, we only
support [MariaDB](https://mariadb.org/) images that allow table versioning with low programmatic effort.
!!! debug "Debug Information"
* Ports: 9096/tcp
* Info: `http://:9096/actuator/info`
* Health: `http://:9096/actuator/health`
* Prometheus: `http://:9096/actuator/prometheus`
* Swagger UI: `http://:9096/swagger-ui/index.html` <a href="/infrastructures/dbrepo/latest/swagger/identifier" target="_blank">:fontawesome-solid-square-up-right: view online</a>
This microservice is responsible for creating and resolving a *persistent identifier* (PID) attached to a query to
obtain the metadata attached to it and allow re-execution of a query. We store both the query and hashes of the query
and result set to allow equality checks of the originally obtained result set and the currently obtained result set. In
the reference implementation we currently only use a numerical id column and plan to integrate *digital object
identifier* (DOI) through our institutional library soon.
!!! debug "Debug Information"
* Ports: 9099/tcp
* Info: `http://:9099/actuator/info`
* Health: `http://:9099/actuator/health`
* Prometheus: `http://:9099/actuator/prometheus`
* Swagger UI: `http://:9099/swagger-ui/index.html` <a href="/infrastructures/dbrepo/latest/swagger/metadata" target="_blank">:fontawesome-solid-square-up-right: view online</a>
This service provides an OAI-PMH endpoint for metadata crawler.
!!! debug "Debug Information"
* Ports: 9093/tcp
* Info: `http://:9093/actuator/info`
* Health: `http://:9093/actuator/health`
* Prometheus: `http://:9093/actuator/prometheus`
* Swagger UI: `http://:9093/swagger-ui/index.html` <a href="/infrastructures/dbrepo/latest/swagger/query" target="_blank">:fontawesome-solid-square-up-right: view online</a>
It provides an interface to insert data into the tables created by the Table Service. It also allows for view-only,
paginated and versioned query execution to the raw data and consumes messages in the message queue from the Broker
Service.
!!! debug "Debug Information"
* Ports: 9094/tcp
* Info: `http://:9094/actuator/info`
* Health: `http://:9094/actuator/health`
* Prometheus: `http://:9094/actuator/prometheus`
* Swagger UI: `http://:9094/swagger-ui/index.html` <a href="/infrastructures/dbrepo/latest/swagger/table" target="_blank">:fontawesome-solid-square-up-right: view online</a>
This microservice handles table operations inside a database that is managed by the Database Service. We
use [Hibernate](https://hibernate.org/orm/) for schema and data ingest operations.
!!! debug "Debug Information"
* Ports: 9098/tcp
* Info: `http://:9098/actuator/info`
* Health: `http://:9098/actuator/health`
* Prometheus: `http://:9098/actuator/prometheus`
* Swagger UI: `http://:9098/swagger-ui/index.html` <a href="/infrastructures/dbrepo/latest/swagger/user" target="_blank">:fontawesome-solid-square-up-right: view online</a>
This microservice handles user information.
This diff is collapsed.
Source diff could not be displayed: it is too large. Options to address this: view the blob.
Source diff could not be displayed: it is too large. Options to address this: view the blob.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment