Skip to content
Snippets Groups Projects
Unverified Commit 63a2a734 authored by Martin Weise's avatar Martin Weise
Browse files

Remove the idea files, added documentation to the website, added docker hub descriptions

parent bbd835e5
No related branches found
No related tags found
No related merge requests found
Showing
with 229 additions and 12421 deletions
......@@ -12,10 +12,6 @@ hide:
In this short getting started guide we show the dependencies to run the database repository and perform a small,
local, test deployment for quickly trying out the features that the repository offers.
!!! danger "Production"
Do not use this small, local, test deployment with production data. It is not secure for production.
## Requirements
### Hardware
......@@ -29,69 +25,50 @@ For this small, local, test deployment any modern hardware would suffice, we rec
### Software
We currently only test RPM-based operating systems. Other systems in theory should also work, but no warranty whatsoever
is given that there might be some compatibility issues in the future.
=== "Linux"
1. [Rocky Linux](https://rockylinux.org/) 8.4+
Install [Docker Desktop](https://docs.docker.com/desktop/install/linux-install/) for Linux
On the local machine, we need installed:
=== "Windows"
1. [Docker Engine](https://docs.docker.com/engine/install/centos/) 18.02.0+
2. [Docker Compose](https://docs.docker.com/compose/install/)
Install [Docker Desktop](https://desktop.docker.com/win/main/amd64/Docker%20Desktop%20Installer.exe) for Windows
And the following minimal software packages to operate the repository:
=== "macOS"
```console
dnf install make
```
Install [Docker Desktop](https://desktop.docker.com/mac/main/amd64/Docker.dmg) for macOS
## Deployment
Next, clone the source code repository into your working directory:
Download the latest [`docker-compose.yml`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/raw/master/docker-compose.yml)
file and deploy the system using your command line.
```console
git clone https://github.com/fair-data-austria/dbrepo.git
```
Start building the metadata database container and the remaining containers (in fast, parallel mode utilizing all cores
of your local machine):
=== "Linux"
```console
docker-compose build fda-metadata-database
docker-compose build --parallel
docker-compose up
```
The system is auto-configured for a small, local, test deployment. You only need to start all containers by executing:
=== "Windows"
```console
docker-compose up
```
!!! bug "Some environments need additional configuration"
In some cluster environments, it is necessary to set Docker's MTU to the main interface MTU. Find out by executing
=== "macOS"
```console
nmcli -f GENERAL device show eth0 | grep "MTU"
docker-compose up
```
Having the wrong MTU set leads to the containers to not downloading the Maven dependencies and the containers are
stuck. This can quickly be solved through setting the correct MTU (e.g. 1450).
```json title="/etc/docker/daemon.json"
{
"mtu": 1450
}
```
The system is auto-configured for a small, local, test deployment. You only need to start all containers by executing:
## Development
We invite all open-source developers to help us fixing bugs and introducing features to the source code. Get involved by
sending a mail to Prof. [Andreas Rauber](mailto:andreas.rauber@tuwien.ac.at)
and Proj.Ass. [Martin Weise](mailto:martin.weise@tuwien.ac.at). Clone the repository and create a feature branch
sending a mail to Prof. Andreas Rauber and Projektass. Martin Weise. Clone the repository and create a feature branch
from `dev` and implement your changes.
## Requirements
### Software
We develop all packages with the following software requirements:
......@@ -102,7 +79,7 @@ We develop all packages with the following software requirements:
4. [Postgres](https://www.postgresql.org/) 12+
5. [MariaDB](https://mariadb.org/) 10+
## Building
### Building
For local development you need to install the entities from the metadata database and the general DTOs that are
exchanged between the services by installing the package:
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -7,24 +7,82 @@ hide:
# System
!!! abstract "Abstract"
Hello
## Architecture
The repository is designed as a microservice architecture to ensure scalability and the utilization of various
technologies. The conceptualized microservices operate the basic database operations, data versioning as well as
*findability*, *accessability*, *interoperability* and *reuseability* (FAIR).
## Database
This container runs a relational database engine that allows data versioning and contains the Query Store, a special
table that stores all queries issued to the Researcher Database along with metadata. We store the queries here and not
in the metadata database level to ensure that they are preserved along with the original database for a regular backup
and archival together with the original database once the container is retired.
### Container
Currently, we only support databases with
the [MariaDB engine](https://hub.docker.com/_/mariadb?tab=tags&page=1&name=10.5&ordering=-name).
DBRepo creates a *root* user for managing the tables, inserting data, etc. and provides a *mariadb* user that is only
granted `select` access to all tables. The default passwords need to be changed at
[`AbstractSeeder.java`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/blob/master/fda-container-service/services/src/main/java/at/tuwien/seeder/impl/AbstractSeeder.java#L39-L51)
### Query Store
The Query Store is a special table (`qs_queries`) that stores all queries issued to the database via the HTTP API. It
stores meta-information about the queries directly in the database container:
<figure markdown>
![Microservice cloud architecture](/images/dia_architecture.png)
<figcaption>Microservice cloud architecture</figcaption>
| Name | Type | Constraint | Default | Comment |
|------------------|--------------|-------------|-------------------------|-------------------------------|
| id | bigint | primary key | nextval(qs_queries_seq) | |
| cid | bigint | | | Column ID |
| dbid | bigint | | | Database ID |
| created | datetime | | now() | |
| created_by | bigint | | | Creator User-ID |
| execution | datetime | | | |
| last_modified | datetime | | | |
| query | text | | | |
| query_normalized | text | | | removing *, randomness |
| query_hash | varchar(255) | | | sha256 hash of `query` field |
| result_hash | varchar(255) | | | sha256 hash of the result set |
| result_number | bigint | | | |
<figcaption>Query Store table <code>qs_queries</code> schema</figcaption>
</figure>
## Services
### Discovery Service
This microservice allows service discovery and registration of containers that provide services.
This microservice allows service discovery and registration of containers that provide services. It configures
a [Spring Cloud Netflix Eureka Server](https://cloud.spring.io/spring-cloud-netflix/reference/html/) to discover
services.
!!! debug "Debug Information"
* Port(s): 9090
* Swagger: not configured
### Gateway Service
Provides a single point of access to the *application programming interface* (API).
Provides a single point of access to the *application programming interface* (API) and configures
the [Spring Cloud Gateway](https://spring.io/projects/spring-cloud-gateway) to route traffic to the services.
!!! debug "Debug Information"
* Port(s): 9095
* Swagger: not configured
<figure markdown>
![Microservice cloud architecture](/images/interaction-gateway.svg)
<figcaption>Microservice cloud architecture</figcaption>
</figure>
### Authentication Service
......@@ -32,14 +90,52 @@ Very specific to the deployment of the organization. In our reference implementa
markup language* (SAML) service provider and use our institutional SAML identity provider for obtaining account data
through an encrypted channel.
The Authentication Service configures [Spring Boot Starter Security](https://spring.io/guides/gs/securing-web/)
with [Java JWT](https://github.com/auth0/java-jwt) for internal authentication once the user details are known in the
metadata database. By default, a token is valid for 24 hours and is used on all HTTP API endpoints.
For the **HTTP API**, obtaining a new token can be done via, e.g. cURL.
```console
$ curl -X POST -d '{"username":"username","password":"password"}' -H "Content-Type: application/json" https://dbrepo.ossdip.at/api/auth
```
Call a secured method by setting the JWT Token as [Bearer Token](https://www.rfc-editor.org/rfc/rfc6750.html) via,
e.g. cURL.
```console
$ curl -X PUT -H "Authorization: Bearer TOKEN" -H "Content-Type: application/json" https://dbrepo.ossdip.at/api/auth
```
For the **AMQP API**, the Authentication Service also creates a dedicated user at the [Broker Service](#broker-service)
that has permissions for writing and configuring the RabbitMQ queues that feed into the database owned by this user.
!!! debug "Debug Information"
* Port(s): 9097
* Swagger UI: [/swagger-ui/index.html](http://localhost:9097/swagger-ui/index.html)
* Swagger API .json: [/v3/api-docs/authentication-service](http://localhost:9097/v3/api-docs/authentication-service)
* Swagger API .yaml: [/v3/api-docs.yaml](http://localhost:9097/v3/api-docs.yaml)
### Metadata Database
is the core component of the project. It is a relational database that contains metadata about all researcher databases
It is the core component of the project. It is a relational database that contains metadata about all researcher databases
created in the database repository like column names, check expressions, value enumerations or key/value constraints and
relevant data for citing data sets. Additionally, the concept, e.g. URI of units of measurements of numerical columns is
stored in the Metadata Database in order to provide semantic knowledge context. We
use [PostgreSQL](https://www.postgresql.org/) for its rich capabilities in the reference implementation.
The default credentials are `postgres:postgres` for the database `fda`. Connect to the database via, e.g. *psql*.
```console
$ psql -d fda -h localhost -p 5432 -U postgres -W
```
!!! debug "Debug Information"
* Port(s): 5432
* Swagger: not configured
### Unit Service
It is designed to map terms in the domain of units of measurement to controlled vocabulary, modelled in
......@@ -47,6 +143,34 @@ the [ontology of units of measure](https://github.com/HajoRijgersberg/OM). This
units and provides a *uniform resource identifier* (URI) to the related concept, which will be stored in the system.
Furthermore, there is a method for auto-completing text and listing a description as well as commonly used unit symbols.
The Unit Service reads units of measurement from [`om-2.ttl`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/blob/master/fda-units-service/onto/om-2.ttl)
and registers a unit. It is used to assign a unit of measurement to a table column.
For the **HTTP API**, the Unit Service assigns a unit of measurement via, e.g. cURL. First the list of concepts can be
queried.
```console
$ curl -X POST -d '{"offset":0,"ustring":"met"}' https://dbrepo.ossdip.at/api/units/suggest
```
Then the concept needs to be saved into the metadata database.
```console
$ curl -X POST -d '{"name":"metre","uri":...}' https://dbrepo.ossdip.at/api/units/saveconcept
```
Then the column can be assigned to a table column.
```console
$ curl -X POST -d '{"cdbid":1,"tid":1,"cid":1,"uri":...}' https://dbrepo.ossdip.at/api/units/savecolumnsconcept
```
!!! debug "Debug Information"
* Port(s): 5010
* Swagger UI: [/swagger-ui/](http://localhost:9097/swagger-ui/)
* Swagger API .json: [/api-units.json](http://localhost:9097/api-units.json)
### Identifier Service
This microservice is responsible for creating and resolving a *persistent identifier* (PID) attached to a query to
......@@ -55,40 +179,108 @@ and result set to allow equality checks of the originally obtained result set an
the reference implementation we currently only use a numerical id column and plan to integrate *digital object
identifier* (DOI) through our institutional library soon.
!!! debug "Debug Information"
* Port(s): 9096
* Swagger UI: [/swagger-ui/index.html](http://localhost:9096/swagger-ui/index.html)
* Swagger API .json: [/v3/api-docs/identifier-service](http://localhost:9096/v3/api-docs/identifier-service)
* Swagger API .yaml: [/v3/api-docs.yaml](http://localhost:9096/v3/api-docs.yaml)
### Search Service
It processes search requests from the Gateway Service for full-text lookups in the Metadata Database. We use
[Elasticsearch](https://www.elastic.co/) in the reference implementation.
The Search Service implements ElasticSearch and creates a retrievable index on all databases that is getting updated
with each save operation on databases in the metadata database. The database name can be queried with ElasticSearch
to e.g. match the term "Airquality"
```console
$ curl http://localhost:9200/databaseindex/_search?q=name:Airquality
```
!!! debug "Debug Information"
* Port(s): 9200, 9600
* ElasticSearch: [/databaseindex](http://localhost:9000/databaseindex)
### Container Service
It is responsible for Docker container lifecycle operations and updating the local copy of the Docker images.
!!! debug "Debug Information"
* Port(s): 9091
* Swagger UI: [/swagger-ui/index.html](http://localhost:9091/swagger-ui/index.html)
* Swagger API .json: [/v3/api-docs/container-service](http://localhost:9091/v3/api-docs/container-service)
* Swagger API .yaml: [/v3/api-docs.yaml](http://localhost:9091/v3/api-docs.yaml)
### Database Service
It creates the databases inside a Docker container and the Query Store. Currently we only
support [MariaDB](https://mariadb.org/) images that allow table versioning with low programmatic effort.
!!! debug "Debug Information"
* Port(s): 9092
* Swagger UI: [/swagger-ui/index.html](http://localhost:9092/swagger-ui/index.html)
* Swagger API .json: [/v3/api-docs/database-service](http://localhost:9092/v3/api-docs/database-service)
* Swagger API .yaml: [/v3/api-docs.yaml](http://localhost:9092/v3/api-docs.yaml)
### Table Service
This microservice handles table operations inside a database that is managed by the Database Service. We
use [Hibernate](https://hibernate.org/orm/) for schema and data ingest operations.
!!! debug "Debug Information"
* Port(s): 9094
* Swagger UI: [/swagger-ui/index.html](http://localhost:9094/swagger-ui/index.html)
* Swagger API .json: [/v3/api-docs/table-service](http://localhost:9094/v3/api-docs/table-service)
* Swagger API .yaml: [/v3/api-docs.yaml](http://localhost:9094/v3/api-docs.yaml)
### Broker Service
It holds exchanges and topics responsible for holding AMQP messages for later consumption. We
use [RabbitMQ](https://www.rabbitmq.com/) in the reference implementation.
For the **HTTP API**, the Broker Service offers an endpoint to manage the AMQP users and their permissions on exchanges
and queues. This endpoint is reachable via the Gateway Service or at port 9098 directly. Internally this service just
parses commands to the [`rabbitmqctl`](https://www.rabbitmq.com/rabbitmqctl.8.html).
For the **AMQP API**, the Broker Service can declare exchanges and queues. The AMQP endpoint listens to port 5672 for
regular declares and offers a management interface at port 15672.
!!! debug "Debug Information"
* Port(s): 9098, 5672, 15672
* Swagger UI: [/swagger-ui/index.html](http://localhost:9098/swagger-ui/index.html)
* Swagger API .json: [/v3/api-docs/broker-service](http://localhost:9098/v3/api-docs/broker-service)
* Swagger API .yaml: [/v3/api-docs.yaml](http://localhost:9098/v3/api-docs.yaml)
* RabbitMQ Management: [/#](http://localhost:15672/#)
### Query Service
It provides an interface to insert data into the tables created by the Table Service. It also allows for view-only,
paginated and versioned query execution to the raw data and consumes messages in the message queue from the Broker
Service.
### Portal
!!! debug "Debug Information"
* Port(s): 9093
* Swagger UI: [/swagger-ui/index.html](http://localhost:9093/swagger-ui/index.html)
* Swagger API .json: [/v3/api-docs/query-service](http://localhost:9093/v3/api-docs/query-service)
* Swagger API .yaml: [/v3/api-docs.yaml](http://localhost:9093/v3/api-docs.yaml)
### FAIR Portal
It provides a *graphical user interface* (GUI) for a researcher to interact with the database repository's API.
!!! debug "Debug Information"
* Port(s): 3000
* GUI: [/#](http://localhost:3000)
### Analyse Service
It suggests data types for the FAIR Portal when creating a table from a *comma separated values* (CSV) file. It
......@@ -96,14 +288,8 @@ recommends enumerations for columns and returns e.g. a list of potential primary
to confirm these suggestions manually. Moreover, the *Analyze Service* determines basic statistical properties of
numerical columns.
### User Database
This container runs a relational database engine that allows data versioning and contains the Query Store, a special
table that stores all queries issued to the Researcher Database along with metadata. We store the queries here and not
in the metadata database level to ensure that they are preserved along with the original database for a regular backup
and archival together with the original database once the container is retired.
### Technical
!!! debug "Debug Information"
We use Docker for deployment. The containers are packing all runtime dependencies, when starting them all necessary
files are already present. For running the infrastructure we use Docker Compose.
* Port(s): 5000
* Swagger UI: [/swagger-ui/](http://localhost:5000/swagger-ui/)
* Swagger API .json: [/api-analyze.json](http://localhost:5000//api-analyze.json)
......@@ -15,12 +15,6 @@ nav:
- Operation:
- operation/index.md
- operation/production.md
- Endpoints:
- operation/endpoints/authentication.md
- operation/endpoints/container.md
- operation/endpoints/database.md
- operation/endpoints/identifier.md
- operation/endpoints/query.md
- publications.md
- contact.md
extra_css:
......@@ -56,5 +50,7 @@ markdown_extensions:
- admonition
- pymdownx.details
- pymdownx.superfences
- pymdownx.tabbed:
alternate_style: true
- toc:
permalink: True
# Default ignored files
/shelf/
/workspace.xml
# Editor-based HTTP Client requests
/httpRequests/
# Datasource local storage ignored files
/dataSources/
/dataSources.local.xml
<?xml version="1.0" encoding="UTF-8"?>
<module type="PYTHON_MODULE" version="4">
<component name="NewModuleRootManager">
<content url="file://$MODULE_DIR$">
<excludeFolder url="file://$MODULE_DIR$/venv" />
</content>
<orderEntry type="inheritedJdk" />
<orderEntry type="sourceFolder" forTests="false" />
</component>
<component name="PyDocumentationSettings">
<option name="format" value="PLAIN" />
<option name="myDocStringFormat" value="Plain" />
</component>
</module>
\ No newline at end of file
<component name="InspectionProjectProfileManager">
<settings>
<option name="USE_PROJECT_PROFILE" value="false" />
<version value="1.0" />
</settings>
</component>
\ No newline at end of file
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.6 (api-authentication)" project-jdk-type="Python SDK" />
</project>
\ No newline at end of file
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectModuleManager">
<modules>
<module fileurl="file://$PROJECT_DIR$/.idea/api-authentication.iml" filepath="$PROJECT_DIR$/.idea/api-authentication.iml" />
</modules>
</component>
</project>
\ No newline at end of file
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="VcsDirectoryMappings">
<mapping directory="$PROJECT_DIR$/../.." vcs="Git" />
</component>
</project>
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment