Skip to content
Snippets Groups Projects
Unverified Commit d0ba614e authored by Martin Weise's avatar Martin Weise
Browse files

Updated documentation

parent 6e1cc220
No related branches found
No related tags found
3 merge requests!231CI: Remove build for log-service,!228Better error message handling in the frontend,!223Release of version 1.4.0
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="426px" height="168px" viewBox="-0.5 -0.5 426 168" style="background-color: rgb(255, 255, 255);"><defs/><g><rect x="0" y="37" width="248" height="130" rx="3.9" ry="3.9" fill="rgb(255, 255, 255)" stroke="rgb(0, 0, 0)" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility" style="overflow: visible; text-align: left;"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe flex-end; justify-content: unsafe center; width: 246px; height: 1px; padding-top: 164px; margin-left: 1px;"><div data-drawio-colors="color: rgb(0, 0, 0); " style="box-sizing: border-box; font-size: 0px; text-align: center;"><div style="display: inline-block; font-size: 12px; font-family: Helvetica; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; font-style: italic; white-space: normal; overflow-wrap: normal;">shared filesystem<br />/tmp</div></div></div></foreignObject><text x="124" y="164" fill="rgb(0, 0, 0)" font-family="Helvetica" font-size="12px" text-anchor="middle" font-style="italic">shared filesystem...</text></switch></g><path d="M 47.5 47.63 L 47.49 30.49 L 47.71 7" fill="none" stroke="rgb(0, 0, 0)" stroke-miterlimit="10" pointer-events="stroke"/><path d="M 47.5 52.88 L 44 45.88 L 47.5 47.63 L 51 45.88 Z" fill="rgb(0, 0, 0)" stroke="rgb(0, 0, 0)" stroke-miterlimit="10" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility" style="overflow: visible; text-align: left;"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 1px; height: 1px; padding-top: 22px; margin-left: 48px;"><div data-drawio-colors="color: rgb(0, 0, 0); background-color: rgb(255, 255, 255); " style="box-sizing: border-box; font-size: 0px; text-align: center;"><div style="display: inline-block; font-size: 11px; font-family: Helvetica; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; background-color: rgb(255, 255, 255); white-space: nowrap;">jdbc</div></div></div></foreignObject><text x="48" y="25" fill="rgb(0, 0, 0)" font-family="Helvetica" font-size="11px" text-anchor="middle">jdbc</text></switch></g><path d="M 22.5 62.6 C 22.5 57.85 33.69 54 47.5 54 C 54.13 54 60.49 54.91 65.18 56.52 C 69.87 58.13 72.5 60.32 72.5 62.6 L 72.5 109.4 C 72.5 114.15 61.31 118 47.5 118 C 33.69 118 22.5 114.15 22.5 109.4 Z" fill="#dae8fc" stroke="#000000" stroke-miterlimit="10" pointer-events="all"/><path d="M 72.5 62.6 C 72.5 67.35 61.31 71.2 47.5 71.2 C 33.69 71.2 22.5 67.35 22.5 62.6" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="all"/><rect x="6.5" y="116" width="85" height="20" fill="none" stroke="none" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility" style="overflow: visible; text-align: left;"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 89px; height: 1px; padding-top: 126px; margin-left: 5px;"><div data-drawio-colors="color: rgb(0, 0, 0); " style="box-sizing: border-box; font-size: 0px; text-align: center;"><div style="display: inline-block; font-size: 12px; font-family: Helvetica; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; white-space: normal; overflow-wrap: normal;">data-db</div></div></div></foreignObject><text x="49" y="130" fill="rgb(0, 0, 0)" font-family="Helvetica" font-size="12px" text-anchor="middle">data-db</text></switch></g><path d="M 160 59.63 L 160 36.49 L 160.1 7" fill="none" stroke="rgb(0, 0, 0)" stroke-miterlimit="10" pointer-events="stroke"/><path d="M 160 64.88 L 156.5 57.88 L 160 59.63 L 163.5 57.88 Z" fill="rgb(0, 0, 0)" stroke="rgb(0, 0, 0)" stroke-miterlimit="10" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility" style="overflow: visible; text-align: left;"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 1px; height: 1px; padding-top: 23px; margin-left: 160px;"><div data-drawio-colors="color: rgb(0, 0, 0); background-color: rgb(255, 255, 255); " style="box-sizing: border-box; font-size: 0px; text-align: center;"><div style="display: inline-block; font-size: 11px; font-family: Helvetica; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; background-color: rgb(255, 255, 255); white-space: nowrap;">http</div></div></div></foreignObject><text x="160" y="26" fill="rgb(0, 0, 0)" font-family="Helvetica" font-size="11px" text-anchor="middle">http</text></switch></g><path d="M 231.37 86 L 288.63 86" fill="none" stroke="rgb(0, 0, 0)" stroke-miterlimit="10" pointer-events="stroke"/><path d="M 226.12 86 L 233.12 82.5 L 231.37 86 L 233.12 89.5 Z" fill="rgb(0, 0, 0)" stroke="rgb(0, 0, 0)" stroke-miterlimit="10" pointer-events="all"/><path d="M 293.88 86 L 286.88 89.5 L 288.63 86 L 286.88 82.5 Z" fill="rgb(0, 0, 0)" stroke="rgb(0, 0, 0)" stroke-miterlimit="10" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility" style="overflow: visible; text-align: left;"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 1px; height: 1px; padding-top: 86px; margin-left: 260px;"><div data-drawio-colors="color: rgb(0, 0, 0); background-color: rgb(255, 255, 255); " style="box-sizing: border-box; font-size: 0px; text-align: center;"><div style="display: inline-block; font-size: 11px; font-family: Helvetica; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; background-color: rgb(255, 255, 255); white-space: nowrap;">S3</div></div></div></foreignObject><text x="260" y="89" fill="rgb(0, 0, 0)" font-family="Helvetica" font-size="11px" text-anchor="middle">S3</text></switch></g><rect x="95" y="66" width="130" height="40" rx="6" ry="6" fill="rgb(255, 255, 255)" stroke="rgb(0, 0, 0)" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility" style="overflow: visible; text-align: left;"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 128px; height: 1px; padding-top: 86px; margin-left: 96px;"><div data-drawio-colors="color: rgb(0, 0, 0); " style="box-sizing: border-box; font-size: 0px; text-align: center;"><div style="display: inline-block; font-size: 12px; font-family: Helvetica; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; white-space: normal; overflow-wrap: normal;">Data DB Sidecar</div></div></div></foreignObject><text x="160" y="90" fill="rgb(0, 0, 0)" font-family="Helvetica" font-size="12px" text-anchor="middle">Data DB Sidecar</text></switch></g><rect x="295" y="66" width="130" height="40" rx="6" ry="6" fill="rgb(255, 255, 255)" stroke="rgb(0, 0, 0)" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility" style="overflow: visible; text-align: left;"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 128px; height: 1px; padding-top: 86px; margin-left: 296px;"><div data-drawio-colors="color: rgb(0, 0, 0); " style="box-sizing: border-box; font-size: 0px; text-align: center;"><div style="display: inline-block; font-size: 12px; font-family: Helvetica; color: rgb(0, 0, 0); line-height: 1.2; pointer-events: all; white-space: normal; overflow-wrap: normal;">Storage Service<br />(minIO)</div></div></div></foreignObject><text x="360" y="90" fill="rgb(0, 0, 0)" font-family="Helvetica" font-size="12px" text-anchor="middle">Storage Service...</text></switch></g></g><switch><g requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"/><a transform="translate(0,-5)" xlink:href="https://www.diagrams.net/doc/faq/svg-export-text-problems" target="_blank"><text text-anchor="middle" font-size="10px" x="50%" y="100%">Text is not SVG - cannot display</text></a></switch></svg>
\ No newline at end of file
.docs/images/architecture-docker-compose.png

65.3 KiB

This diff is collapsed.
Source diff could not be displayed: it is too large. Options to address this: view the blob.
.docs/images/minio-download.png

203 KiB

.docs/images/minio-upload.png

188 KiB

...@@ -26,6 +26,18 @@ curl \ ...@@ -26,6 +26,18 @@ curl \
-d '{"name": "Data Database 2", "imageId": 1, "host": "example.com", "port": 3306, "privilegedUsername": "root", "privilegedPassword": "s3cr3t" }' -d '{"name": "Data Database 2", "imageId": 1, "host": "example.com", "port": 3306, "privilegedUsername": "root", "privilegedPassword": "s3cr3t" }'
``` ```
### Sidecar
We deploy a sidecar that handles the CSV-file upload/download operations between
the [Storage Service](../system-services-storage) and the Data Database using a Python Flask application and
the [`boto3`](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) client until MariaDB supports S3
natively.
<figure markdown>
![Sidecar architecture detailed](images/architecture-data-db.svg)
<figcaption>Sidecar that handles the CSV-file upload/download.</figcaption>
</figure>
### Backup ### Backup
Export all databases with `--skip-lock-tables` option for MariaDB Galera clusters as it is not supported currently by Export all databases with `--skip-lock-tables` option for MariaDB Galera clusters as it is not supported currently by
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
author: Martin Weise author: Martin Weise
--- ---
# UI # User Interface
## tl;dr ## tl;dr
...@@ -28,6 +28,39 @@ It provides a *user interface* (UI) for a researcher to interact with the databa ...@@ -28,6 +28,39 @@ It provides a *user interface* (UI) for a researcher to interact with the databa
<figcaption>Architecture of the UI microservice</figcaption> <figcaption>Architecture of the UI microservice</figcaption>
</figure> </figure>
### Example
Upload a file to the `dbrepo-upload` bucket in the [Storage Service](../system-services-storage/) using the Node.js
middleware. The request must be sent with the `Content-Type: multipart/form-data` header and the file must be placed
in the `file` field of the form. For example:
```shell
curl -X POST \
-F "file=@path/to/file/gps.csv" \
http://<hostname>/server-middleware/upload
```
The response looks like this:
```json
{
"fieldname": "file",
"originalname": "gps.csv",
"encoding": "7bit",
"mimetype": "text/csv",
"buffer": {
"type": "Buffer",
"data": [
34,
73,
...
]
},
"size": 130279,
"etag": "9d23e73f4ed9f7e5afc80e696db69ebb"
}
```
## Limitations ## Limitations
(none) (none)
......
...@@ -17,14 +17,55 @@ author: Martin Weise ...@@ -17,14 +17,55 @@ author: Martin Weise
## Overview ## Overview
It suggests data types for the FAIR Portal when creating a table from a *comma separated values* (CSV) file. It It suggests data types for the [User Interface](../system-other-ui) when creating a table from a
recommends enumerations for columns and returns e.g. a list of potential primary key candidates. The researcher is able *comma separated values* (CSV) -file. It recommends enumerations for columns and returns e.g. a list of potential
to confirm these suggestions manually. Moreover, the *Analyze Service* determines basic statistical properties of primary key candidates. The researcher is able to confirm these suggestions manually. Moreover, the Analyse Service
numerical columns. determines basic statistical properties of numerical columns.
## Limitations ### Analysis
After [uploading](../system-services-storage/#buckets) the CSV-file into the `dbrepo-upload` bucket of
the [Storage Service](../system-services-storage), analysis for data types and primary keys follows the flow:
1. Retrieve the CSV-file from the `dbrepo-upload` bucket of the Storage Service as data stream (=nothing is stored in
the service) with the [`boto3`](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) client.
2. When no separator is known, the Analyse Service tries to guess the separator from the first line
with [`csv.Sniff().sniff(...)`](https://docs.python.org/3/library/csv.html#csv.Sniffer). This step is optional when
the separator was provided via HTTP-payload: `{"separator": ";", ...}`
3. With the separator known (either from step 2 or via HTTP-payload),
the [`messytables.CSVTableSet(...)`](https://messytables.readthedocs.io/en/latest/#csv-support) guesses the headers
and column types and enums, if the HTTP-payload contains `{"enum": true, ...}`.
### Examples
Given a [CSV-file](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-datasets/-/raw/master/gps.csv)
containing GPS-data `gps.csv` already uploaded in the `dbrepo-upload` bucket of the Storage Service with key `gps.csv`:
* No support for authentication ```shell
curl -X POST \
-d '{"filename":"gps.csv","separator":","}'
http://<hostname>:5000/api/analyse/determinedt
```
This results in the response:
```json
{
"columns": {
"ID": "bigint",
"KEY": "varchar",
"OBJECTID": "bigint",
"LBEZEICHNUNG": "varchar",
"LTYP": "bigint",
"LTYPTXT": "varchar",
"LAT": "decimal",
"LNG": "decimal"
},
"separator": ","
}
```
## Limitations
!!! question "Do you miss functionality? Do these limitations affect you?" !!! question "Do you miss functionality? Do these limitations affect you?"
...@@ -34,4 +75,4 @@ numerical columns. ...@@ -34,4 +75,4 @@ numerical columns.
## Security ## Security
1. Since authentication is not supported, use IP-based ingress rules to limit access to the upload endpoint. 1. Credentials for the [Storage Service](../system-services-storage) are stored in plaintext environment variables.
---
author: Martin Weise
---
# Storage Service
## tl;dr
!!! debug "Debug Information"
Image: [`bitnami/minio:2023-debian-11`](https://hub.docker.com/r/bitnami/minio)
* Ports: 9000/tcp, 9001/tcp
* Console: `http://<hostname>/admin/storage`
## Overview
We use [minIO](https://min.io) as a high-performance, S3 compatible object store packaged by Bitnami (VMware) for easy
cloud-ready deployments that by default support replication and monitoring.
### Users
The default configuration creates one user `minioadmin` with password `minioadmin`.
### Buckets
The default configuration creates two buckets `dbrepo-upload`, `dbrepo-download`:
* `dbrepo-upload` for CSV-file upload (for import of data, analysis, etc.) from the User Interface
* `dbrepo-download` for CSV-file download (exporting data, metadata, etc.)
### Metrics Collection
By default, Prometheus metrics are not enabled as they require a running Prometheus server in the background. You can
enable the metrics endpoint by setting the following environment variables in the `docker-compose.yml` (deployment with
[Docker Compose](../deployment-docker-compose)) or `values.yml` (deployment with [Helm](../deployment-helm/)) according
to the [minIO documentation](https://min.io/docs/minio/linux/operations/monitoring/collect-minio-metrics-using-prometheus.html).
### Examples
Upload a CSV-file into the `dbrepo-upload` bucket with the console
via `http://<hostname>/admin/storage/browser/dbrepo-upload`.
<figure markdown>
![Data ingest](images/minio-upload.png){ .img-border }
<figcaption>Uploading a file with the minIO console storage browser.</figcaption>
</figure>
Alternatively, you can use the middleware of the [User Interface](../system-other-ui/) to upload files.
Download a CSV-file from the `dbrepo-download` bucket with the console
via `http://<hostname>/admin/storage/browser/dbrepo-download`.
<figure markdown>
![Data ingest](images/minio-download.png){ .img-border }
<figcaption>Downloading a file with the minIO console storage browser.</figcaption>
</figure>
Alternatively, you can use a S3-compatible client:
* [minIO Client](https://min.io/docs/minio/linux/reference/minio-mc.html) (most generic implementation of S3)
* [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) (generic Python implementation of S3)
* AWS SDK (tailored towards Amazon S3)
## Limitations
* Prometheus metrics are not enabled by default (they require a running Prometheus server).
!!! question "Do you miss functionality? Do these limitations affect you?"
We strongly encourage you to help us implement it as we are welcoming contributors to open-source software and get
in [contact](../contact) with us, we happily answer requests for collaboration with attached CV and your programming
experience!
## Security
1. For public deployments, change the default credentials.
---
author: Martin Weise
---
# Upload Service
## tl;dr
!!! debug "Debug Information"
Image: [`dbrepo/upload-service:latest`](https://hub.docker.com/r/dbrepo/upload-service)
* Ports: 1080/tcp
* TUS: `http://<hostname>:1080/api/upload/files`
* Prometheus: `http://<hostname>:1080/metrics`
* Swagger UI: <a href="../swagger/upload" target="_blank">:fontawesome-solid-square-up-right: view online</a>
## Overview
Upload files using one of the official the TUSd clients:
* [NodeJS / JavaScript](https://github.com/tus/tus-js-client)
* [Java](https://github.com/tus/tus-java-client)
* [Python](https://github.com/tus/tus-py-client)
The [TUS](https://tus.io/) protocol allows for flexible file uploads that, when interrupted, can be resumed at a later
point. It is based on the open HTTP protocol and uploading a new file is a sequence of `HEAD`, `POST` and `PATCH`
requests for large files.
For more information, see the [official Docker image](https://hub.docker.com/r/tusproject/tusd).
## Limitations
* No support for authentication
!!! question "Do you miss functionality? Do these limitations affect you?"
We strongly encourage you to help us implement it as we are welcoming contributors to open-source software and get
in [contact](../contact) with us, we happily answer requests for collaboration with attached CV and your programming
experience!
## Security
1. Since authentication is not supported, use IP-based ingress rules to limit access to the upload endpoint.
...@@ -22,7 +22,7 @@ nav: ...@@ -22,7 +22,7 @@ nav:
- Data Service: system-services-data.md - Data Service: system-services-data.md
- Metadata Service: system-services-metadata.md - Metadata Service: system-services-metadata.md
- Mirror Service: system-services-mirror.md - Mirror Service: system-services-mirror.md
- Upload Service: system-services-upload.md - Storage Service: system-services-storage.md
- Databases: - Databases:
- Auth Database: system-databases-auth.md - Auth Database: system-databases-auth.md
- Data Database: system-databases-data.md - Data Database: system-databases-data.md
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment