Skip to content
Snippets Groups Projects
Unverified Commit 19fc545f authored by Martin Weise's avatar Martin Weise
Browse files

Copied the docs

parent 84da7945
Branches
Tags
4 merge requests!231CI: Remove build for log-service,!228Better error message handling in the frontend,!223Release of version 1.4.0,!207Resolve "Merge documentation into this repository"
Showing
with 472 additions and 0 deletions
---
author: Martin Weise
hide:
- navigation
---
# Contact
## Team
### Strategy & Partnerships
Ao.univ.Prof. Dr. [Andreas Rauber](https://www.ifs.tuwien.ac.at/~andi)<br />
Technische Universit&auml;t Wien<br />
Research Unit Data Science<br />
Favoritenstra&szlig;e 9-11<br />
A-1040 Vienna, Austria
### Technical Lead
Projektass. Dipl.-Ing. [Martin Weise](https://ec.tuwien.ac.at/~weise/)<br />
Technische Universit&auml;t Wien<br />
Research Unit Data Science<br />
Favoritenstra&szlig;e 9-11<br />
A-1040 Vienna, Austria
## Contributors (alphabetically)
- Ganguly, Raman
- Gergely, Eva
- G&uuml;&#231;l&uuml;, G&ouml;kay
- Grantner, Tobias
- Karnbach, Geoffrey
- Lukic, Nikola
- Mahler, Lukas
- Michlits, Cornelia
- Rauber, Andreas
- Spannring, Max
- Staudinger, Moritz
- Stytsenko, Kirill
- Taha, Josef
- Tsepelakis, Sotirios
- Weise, Martin
Interested in contributing? Send us an e-mail!
---
author: Martin Weise
---
# Docker Compose
## TL;DR
If you have [Docker](https://docs.docker.com/engine/install/) already installed on your system, you can install DBRepo with:
```shell
curl -sSL https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/raw/dev/install.sh | bash
```
## Architecture
The repository is designed as a microservice architecture to ensure scalability and the utilization of various
technologies. The conceptualized microservices operate the basic database operations, data versioning as well as
*findability*, *accessability*, *interoperability* and *reuseability* (FAIR).
<figure markdown>
![DBRepo architecture](images/architecture-docker-compose.svg)
<figcaption>Architecture of the services deployed via Docker Compose</figcaption>
</figure>
Alternatively, you can also deploy DBRepo with [Helm](../deployment-helm/) in your virtual machine instead.
## Environment Values
| Key | Type | Default | Description |
|------------------------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `DBREPO_CLIENT_ID` | string | `dbrepo-client` | Client ID of the keycloak client for API communication. |
| `DBREPO_CLIENT_SECRET` | string | `MUwRc7yfXSJwX8AdRMWaQC3Nep1VjwgG` | Client secret of the keycloak client, this should be changed in the admin console of keycloak. |
| `JWT_ISSUER` | string | `http://localhost/api/auth/realms/dbrepo` | The issuer in the JWT `iss` field of the (decoded) token. Public deployments with hostnames other than localhost need to change that. The issuer always has the form `<PROTOCOL>://<HOSTNAME>/api/auth/realms/dbrepo`, e.g. change PROTOCOL to https for SSL/TLS deployments. |
| `JWT_PUBKEY` | string | `MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAqqnHQ2BWWW9vDNLRCcxD++xZg/16oqMo/c1l+lcFEjjAIJjJp/HqrPYU/U9GvquGE6PbVFtTzW1KcKawOW+FJNOA3CGo8Q1TFEfz43B8rZpKsFbJKvQGVv1Z4HaKPvLUm7iMm8Hv91cLduuoWx6Q3DPe2vg13GKKEZe7UFghF+0T9u8EKzA/XqQ0OiICmsmYPbwvf9N3bCKsB/Y10EYmZRb8IhCoV9mmO5TxgWgiuNeCTtNCv2ePYqL/U0WvyGFW0reasIK8eg3KrAUj8DpyOgPOVBn3lBGf+3KFSYi+0bwZbJZWqbC/Xlk20Go1YfeJPRIt7ImxD27R/lNjgDO/MwIDAQAB` | Public key that can verify the JWT signature, this should be changed. |
| `JWT_CERT` | string | `MIICmzCCAYMCBgGG3GWyBTANBgkqhkiG9w0BAQsFADARMQ8wDQYDVQQDDAZkYnJlcG8wHhcNMjMwMzEzMTkxMzE3WhcNMzMwMzEzMTkxNDU3WjARMQ8wDQYDVQQDDAZkYnJlcG8wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCqqcdDYFZZb28M0tEJzEP77FmD/Xqioyj9zWX6VwUSOMAgmMmn8eqs9hT9T0a+q4YTo9tUW1PNbUpwprA5b4Uk04DcIajxDVMUR/PjcHytmkqwVskq9AZW/Vngdoo+8tSbuIybwe/3Vwt266hbHpDcM97a+DXcYooRl7tQWCEX7RP27wQrMD9epDQ6IgKayZg9vC9/03dsIqwH9jXQRiZlFvwiEKhX2aY7lPGBaCK414JO00K/Z49iov9TRa/IYVbSt5qwgrx6DcqsBSPwOnI6A85UGfeUEZ/7coVJiL7RvBlsllapsL9eWTbQajVh94k9Ei3sibEPbtH+U2OAM78zAgMBAAEwDQYJKoZIhvcNAQELBQADggEBAASnN1Cuif1sdfEK2kWAURSXGJCohCROLWdKFjaeHPRaEfpbFJsgxW0Yj3nwX5O3bUlOWoTyENwnXSsXMQsqnNi+At32CKaKO8+AkhAbgQL9F0B+KeJwmYv3cUj5N/LYkJjBvZBzUZ4Ugu5dcxH0k7AktLAIwimkyEnxTNolOA3UyrGGpREr8MCKWVr10RFuOpF/0CsJNNwbHXzalO9D756EUcRWZ9VSg6QVNso0YYRKTnILWDn9hcTRnqGy3SHo3anFTqQZ+BB57YbgFWy6udC0LYRB3zdp6zNti87eu/VEymiDY/mmo1AB8Tm0b6vxFz4AKcL3ax5qS6YnZ9efSzk=IJjJp/HqrPYU/U9GvquGE6PbVFtTzW1KcKawOW+FJNOA3CGo8Q1TFEfz43B8rZpKsFbJKvQGVv1Z4HaKPvLUm7iMm8Hv91cLduuoWx6Q3DPe2vg13GKKEZe7UFghF+0T9u8EKzA/XqQ0OiICmsmYPbwvf9N3bCKsB/Y10EYmZRb8IhCoV9mmO5TxgWgiuNeCTtNCv2ePYqL/U0WvyGFW0reasIK8eg3KrAUj8DpyOgPOVBn3lBGf+3KFSYi+0bwZbJZWqbC/Xlk20Go1YfeJPRIt7ImxD27R/lNjgDO/MwIDAQAB` | Public key that can verify the JWT signature, this should be changed. |
## Requirements
### Hardware
For this small, local, test deployment any modern hardware would suffice, we recommend a dedicated virtual machine with
the following settings. Note that most of the vCPU and RAM resources will be needed for starting the infrastructure,
this is because of Docker. During idle times, the deployment will use significantly less resources.
- 4 vCPU cores
- 16GB RAM memory
- 100GB SSD storage
### Software
Install Docker Engine for your operating system. There are excellent guides available for Linux, we highly recommend
to use a stable distribution such as [:simple-debian: Debian](https://www.debian.org/download). In the following guide
we only consider Debian.
## Deployment
We maintain a rapid prototype deployment option through Docker Compose (v2.17.0 and newer). This deployment creates the
core infrastructure and a single Docker container for all user-generated databases.
curl -sSL https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/raw/dev/install.sh | sudo bash
View the logs:
docker compose logs -f
You should now be able to view the front end at [http://localhost:80](http://localhost:80).
Please be warned that the default configuration is not intended for public deployments. It is only intended to have a
running system within minutes to play around within the system and explore features. It is strongly advised to change
the default `.env` environment variables.
!!! warning "Known security issues with the default configuration"
The system is auto-configured for a small, local, test deployment and is *not* secure! You need to make modifications
in various places to make it secure:
* **Authentication Service**:
a. You need to use your own instance or configure a secure instance using a (self-signed) certificate.
Additionally, when serving from a non-default Authentication Service, you need to put it into the
`JWT_ISSUER` environment variable (`.env`).
b. You need to change the default admin user `fda` password in Realm
master > Users > fda > Credentials > Reset password.
c. You need to change the client secrets for the clients `dbrepo-client` and `broker-client`. Do this in Realm
dbrepo > Clients > dbrepo-client > Credentials > Client secret > Regenerate. Do the same for the
broker-client.
d. You need to regenerate the public key of the `RS256` algorithm which is shared with all services to verify
the signature of JWT tokens. Add your securely generated private key in Realm
dbrepo > Realm settings > Keys > Providers > Add provider > rsa.
* **Broker Service**: by default, this service is configured with an administrative user that has major privileges.
You need to change the password of the user *fda* in Admin > Update this user > Password. We found this
[simple guide](https://onlinehelp.coveo.com/en/ces/7.0/administrator/changing_the_rabbitmq_administrator_password.htm)
to be very useful.
* **Search Database**: by default, this service is configured to require authentication with an administrative user
that is allowed to write into the indizes. Following
this [simple guide](https://www.elastic.co/guide/en/elasticsearch/reference/8.7/reset-password.html), this can be
achieved using the command line.
* **Gateway Service**: by default, no HTTPS is used that protects the services behind. You need to provide a trusted
SSL/TLS certificate in the configuration file or use your own proxy in front of the Gateway Service. See this
[simple guide](http://nginx.org/en/docs/http/configuring_https_servers.html) on how to install a SSL/TLS
certificate on NGINX.
## Upgrade Guide
### 1.2 to 1.3
In case you have a previous deployment from version 1.2, shut down the containers and back them up manually. You can do
this by using the `busybox` image. Replace `deadbeef` with your container name or hash:
```console
export NAME=dbrepo-userdb-xyz
docker run --rm --volumes-from $NAME -v /home/$USER/backup:/backup busybox tar pcvfz /backup/$NAME.tar.gz /var/lib/mysql
```
!!! danger "Wipe all traces of DBRepo from your system"
To erase all traces of DBRepo from your computer or virtual machine, the following commands delete all containers,
volumes and networks that are present, execute the following **dangerous** command. It will **wipe** all information
about DBRepo from your system (excluding the images).
```console
docker container stop $(docker container ls -aq -f name=^/dbrepo-.*) || true
docker container rm $(docker container ls -aq -f name=^/dbrepo-.*) || true
docker volume rm $(docker volume ls -q -f name=^dbrepo-.*) || true
docker network rm $(docker network ls -q -f name=^dbrepo-.*) || true
```
You can restore the volume *after* downloading the new 1.3 images and creating the infrastructure:
```console
export NAME=dbrepo-userdb-xyz
export PORT=12345
docker container create -h $NAME --name $NAME -p $PORT:3306 -e MARIADB_ROOT_PASSWORD=mariadb --network userdb -v /backup mariadb:10.5
docker run --rm --volumes-from $NAME -v /home/$USER/backup/.tar.gz:/backup/$NAME.tar.gz busybox sh -c 'cd /backup && tar xvfz /backup/$NAME.tar.gz && cp -r /backup/var/lib/mysql/* /var/lib/mysql'
```
Future releases will be backwards compatible and will come with migration scripts.
---
author: Martin Weise
---
## TL;DR
To install DBRepo in your existing cluster, download the sample [`values.yaml`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-deployment/-/raw/master/charts/dbrepo-core/values.yaml?inline=false)
for your deployment and update the variables, especially `hostname`. The chart depends on
installed [Keycloak Operator](https://www.keycloak.org/operator/installation) that can be installed following the
official guide.
```shell
helm upgrade --install dbrepo \
-n dbrepo \
"oci://dbrepo.azurecr.io/helm/dbrepo-core" \
--values ./values.yaml \
--version "0.1.3" \
--create-namespace \
--cleanup-on-fail
```
## Dependencies
The helm chart depends on four components:
1. [Ingress NGINX Controller](https://kubernetes.github.io/ingress-nginx/) for basic ingress.
2. [Cert-Manager Controller](https://cert-manager.io/) for TLS certificate management with Let's Encrypt.
3. [MariaDB Operator](https://github.com/mariadb-operator/mariadb-operator/) for creation of databases.
4. [Keycloak Operator](https://www.keycloak.org/operator/installation) for creation of the authentication service.
## Configuration before the installation
Define an admin user that the services can use to communicate with
the [authentication service](../system-services-authentication). You will need to manually create this user later after
the installation.
## Configuration after the installation
After installing, get the initial administrator password created by the [Keycloak operator](https://www.keycloak.org/operator/basic-deployment):
```shell
kubectl -n dbrepo \
get \
secret \
auth-service-initial-admin \
-o jsonpath='{.data.password}' | base64 --decode
```
On success, the output should look like this: `1f5581a01d8e8f47f2dae08cc88f56fd` which is the initial password for the
user `admin`. This password should be considered as *temporary* and be changed immediately now! Login into
the [authentication service](../system-services-authentication) as `admin` and:
1. Create a new user in the `master` realm.
2. Create credentials (non-temporary) for this user in the `master` realm.
3. Assign this user the role `admin`.
4. Delete the user `admin`.
### Backup
tbd
### Restore
tbd
## Limitations
1. MariaDB Galera does not (yet) support XA-transactions required by the authentication service (=Keycloak). Therefore
only a single MariaDB pod can be deployed at once for the [auth database](../system-databases-auth).
!!! question "Do you miss functionality? Do these limitations affect you?"
We strongly encourage you to help us implement it as we are welcoming contributors to open-source software and get
in [contact](../contact) with us, we happily answer requests for collaboration with attached CV and your programming
experience!
---
author: Martin Weise
---
# Special Instructions for Azure Cloud
You can use our pre-built Helm chart for deploying DBRepo in your Kubernetes Cluster
with Microsoft Azure as infrastructure provider.
## Requirements
### Hardware
For this small cloud, test deployment any public cloud provider would suffice, we recommend a
small [Kubernetes Service](https://azure.microsoft.com/en-us/products/kubernetes-service)
with Kubernetes version *1.24.10* and node sizes *Standard_B4ms*
- 4 vCPU cores
- 16GB RAM memory
- 200GB SSD storage
This is roughly met by selecting the *Standard_B4ms* flavor and three worker nodes.
## Deployment
### Databases
Since Azure offers a managed [Azure Database for MariaDB](https://azure.microsoft.com/en-us/products/mariadb), we
recommend to at least deploy the Metadata Database as high-available, managed database.
!!! warning "End of Life software"
Unfortunately, Azure does not (yet) support managed MariaDB 10.5, the latest version supported by Azure is 10.3
which is End of Life (EOL) from [May 2023 onwards](https://mariadb.com/kb/en/changes-improvements-in-mariadb-10-3/).
Microsoft decided to still maintain MariaDB 10.3
until [September 2025](https://learn.microsoft.com/en-us/azure/mariadb/concepts-supported-versions).
### Fileshare
For the shared volume *PersistentVolumeClaim* `dbrepo-shared-volume-claim`, select an appropriate *StorageClass* that
supports:
1. Access mode `ReadWriteMany`
2. Hardlinks (TUSd creates lockfiles during upload)
You will need to use a *StorageClass* of either `managed-*` or `azureblob-*` (after enabling the
proprietary [CSI driver for BLOB storage](https://learn.microsoft.com/en-us/azure/aks/azure-blob-csi?tabs=NFS#azure-blob-storage-csi-driver-features)
in your Kubernetes Cluster).
We recommend to create
a [Container](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction#containers) for the
[Upload Service](../system-services-upload) to deposit files and mount the BLOB storage
via CSI drivers into the *Deployment*. It greatly increases the available interfaces (see below) for file uploads and
provides a highly-available filesystem for the many deployments that need to use the files.
---
author: Martin Weise
---
# Special Instructions for Minikube
You can use our Helm chart for deploying DBRepo in your Kubernetes Cluster
using [minikube](https://minikube.sigs.k8s.io/docs/start/) as infrastructure provider which deploys a single-node Kubernetes cluster on your machine,
suitable for test-deployments.
## Requirements
### Virtual Machine
For this small, local, test deployment any modern hardware would suffice, we recommend a dedicated virtual machine with
the following settings. Note that most of the vCPU and RAM resources will be needed for starting the infrastructure,
this is because of Docker. During idle times, the deployment will use significantly less resources.
- 4 vCPU cores
- 16GB RAM memory
- 200GB SSD storage
### Minikube
First, install the minikube virtualization tool that provides a single-node Kubernetes environment, e.g. on a virtual
machine. We do not regularly check these instructions, they are provided on best-effort. Check
the [official documentation](https://minikube.sigs.k8s.io/docs/start/) for up-to-date information.
For Debian:
```shell
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.deb
sudo dpkg -i minikube_latest_amd64.deb
```
Start the cluster and enable basic plugins:
```shell
minikube start --driver='docker'
minikube kubectl -- get po -A
minikube addons enable ingress
```
### NGINX
Deploy a NGINX reverse proxy on the virtual machine to reach your minikube cluster from the public Internet:
```nginx title="/etc/nginx/conf.d/dbrepo.conf"
resolver 127.0.0.11 valid=30s;
server {
listen 80;
server_name _;
location / {
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_pass http://CLUSTER_IP;
}
}
server {
listen 443 ssl;
server_name DOMAIN_NAME;
ssl_certificate /etc/nginx/certificate.crt;
ssl_certificate_key /etc/nginx/certificate.key;
location / {
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_pass https://CLUSTER_IP;
}
}
```
Replace `CLUSTER_IP` with the result of:
$ minikube ip
192.168.49.2
Replace `DOMAIN_NAME` with the domain name. You will need also a valid TLS certificate with private key for TLS enabled
in the cluster. In our test deployment we obtained a certificate from Let's Encrypt.
### Fileshare
Since the Upload Service uses a shared filesystem with the [Analyst Service](../system-services-analyse),
[Metadata Service](../system-services-metadata) and
[Data Database](../system-databases-data), the dynamic provision of the *PersistentVolume*
by the *PersistentVolumeClaim*
of [`pvc.yaml`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-deployment/-/blob/master/charts/dbrepo-core/templates/upload-service/pvc.yaml)
needs to happen statically. You can make use of the host's filesystem and mount it in each of those deployments.
For example, mount the *hostPath* directly in
the [`deployment.yaml`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-deployment/-/blob/master/charts/dbrepo-core/templates/analyse-service/deployment.yaml).
```yaml title="deployment.yaml"
apiVersion: apps/v1
kind: Deployment
metadata:
name: analyse-service
...
spec:
template:
spec:
containers:
- name: analyse-service
volumeMounts:
- name: shared
hostPath: /path/of/host
mountPath: /mnt/shared
...
```
## Deployment
To install the DBRepo Helm Chart, download and edit
the [`values.yaml`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-deployment/-/raw/master/charts/dbrepo-minikube/values.yaml?inline=false)
file. At minimum you need to change the values for:
* `hostname`, set to your domain, e.g. `subdomain.example.com`
* `authAdminApiUrl`, similarly but with https and the api to the keycloak server, e.g. `https://subdomain.example.com/api/auth`
It is advised to also change the usernames and passwords for all credentials. Next, install the chart using your edited
`values.yaml` file:
!!! info "Documentation of values.yaml"
We documented all values in the `values.yaml` file [here](http://127.0.0.1:8000/deployment-helm/#chart-values) with
default values and description for each value.
```shell
helm upgrade --install dbrepo \
-n dbrepo \
"oci://dbrepo.azurecr.io/helm/dbrepo-core" \
--values ./values.yaml \
--version "0.1.3" \
--create-namespace \
--cleanup-on-fail
```
-----BEGIN PUBLIC KEY-----
role: mweise
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEiF4l7rlcaope9LGiodp6yHRtsUek
WjYX8mVi3AAcuoXvKtnbRZTwX78FOID2zZiQSsHWIcuMDOKJfubNzWrtMw==
-----END PUBLIC KEY-----
.docs/images/DS-icon_white_hiRes.png

3.67 KiB

.docs/images/TU_Signet_weiss_transparent_300dpi_RGB.png

19.2 KiB

.docs/images/architecture-core.png

301 KiB

Source diff could not be displayed: it is too large. Options to address this: view the blob.
.docs/images/architecture-docker-compose.png

59.8 KiB

This diff is collapsed.
.docs/images/architecture-ui.png

33.9 KiB

.docs/images/auth-create.png

69.9 KiB

.docs/images/custom_icon.png

8.18 KiB

.docs/images/custom_logo.png

68.7 KiB

This diff is collapsed.
.docs/images/deployment-tuwien.png

68.3 KiB

.docs/images/deployment.png

66.6 KiB

.docs/images/groups-roles.png

51.2 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment