System¶
Abstract
This is the full system description from a technical/developer view.
We invite all open-source developers to help us fixing bugs and introducing features to the source code. Get involved by sending a mail to Prof. Andreas Rauber and Projektass. Martin Weise.
Architecture¶
The repository is designed as a microservice architecture to ensure scalability and the utilization of various technologies. The conceptualized microservices operate the basic database operations, data versioning as well as findability, accessability, interoperability and reuseability (FAIR).
Services¶
View the docker images for the documentation of the service.
Analyse Service¶
Debug Information
- Ports: 5000/tcp
- Prometheus:
http://:5000/metrics
- Swagger UI:
http://:5000/swagger-ui/index.html
view online
It suggests data types for the FAIR Portal when creating a table from a comma separated values (CSV) file. It recommends enumerations for columns and returns e.g. a list of potential primary key candidates. The researcher is able to confirm these suggestions manually. Moreover, the Analyze Service determines basic statistical properties of numerical columns.
Authentication Service¶
Debug Information
- Ports: 8080/tcp, 8443/tcp
- Admin Console:
http://:8443/
Very specific to the deployment of the organization. In our reference implementation we implement a security assertion markup language (SAML) service provider and use our institutional SAML identity provider for obtaining account data through an encrypted channel.
From version 1.2 onwards we use Keycloak for authentication and deprecated the previous Spring Boot application. Consequently, the authentication will be through Keycloak.
Unsupported Keycloak features
Due to no demand at the time, we currently do not support the following Keycloak features:
- E-Mail verification
- Temporary passwords
By default, the Authentication Service comes with a self-signed certificate valid 3 months from build date. For deployment it is highly encouraged to use your own certificate, properly issued by a trusted PKI, e.g. GÉANT. For local deployments you can use the self-signed certificate. You need to accept the risk in most browsers when visiting the admin panel.
Sign in with the default credentials (username fda
, password fda
) or the one you configured during set-up. Be
default, users are created using the frontend and the sign-up page. But it is also possible to create users from
Keycloak, they will still act as "self-sign-up" created users. Since we do not support all features of Keycloak, leave
out required user actions as they will not be enforced, also the temporary password.
Each user has attributes associated to them. In case you manually create a user in Keycloak directly, you need to add them in Users > Add user > Attributes:
theme_dark
(boolean, default: false)orcid
(string)affiliation
(string)
Groups¶
The authorization scheme follows a group-based access control (GBAC). Users are organized in three distinct (non-overlapping) groups:
- Researchers (default)
- Developers
- Data Stewards
Based on the membership in one of these groups, the user is assigned a set of roles that authorize specific actions. By
default, all users are assigned to the researchers
group.
Roles¶
We organize the roles into default- and escalated composite roles. There are three composite roles, one for each group. Each of the composite role has a set of other associated composite roles.
There is one role for one specific action in the services. For example: the create-database
role authorizes a user to
create a database in a Docker container. Therefore,
the DatabaseEndpoint.java
endpoint requires a JWT access token with this authority.
@PostMapping
@PreAuthorize("hasAuthority('create-database')")
public ResponseEntity<DatabaseBriefDto> create(@NotNull Long containerId,
@Valid @RequestBody DatabaseCreateDto createDto,
@NotNull Principal principal) {
...
}
Default Container Handling¶
Name | Description |
---|---|
create-container |
Can create a container |
find-container |
Can find a specific container |
list-containers |
Can list all containers |
modify-container-state |
Can start and stop the own container |
Default Database Handling¶
Name | Description |
---|---|
check-database-access |
Can check the access to a database of a user |
create-database |
Can create a database |
create-database-access |
Can give a new access to a database of a user |
delete-database-access |
Can delete the access to a database of a user |
find-database |
Can find a specific database in a container |
list-databases |
Can list all databases in a container |
modify-database-visibility |
Can modify the database visibility (public, private) |
modify-database-owner |
Can modify the database owner |
update-database-access |
Can update the access to a database of a user |
Default Table Handling¶
Name | Description |
---|---|
create-table |
Can create a table |
find-tables |
Can list a specific table in a database |
list-tables |
Can list all tables |
modify-table-column-semantics |
Can modify the column semantics of a specific column |
Default Query Handling¶
Name | Description |
---|---|
create-database-view |
Can create a view in a database |
delete-database-view |
Can delete a view in a database |
delete-table-data |
Can delete data in a table |
execute-query |
Can execute a query statement |
export-query-data |
Can export the data that a query has produced |
export-table-data |
Can export the data stored in a table |
find-database-view |
Can find a specific database view |
find-query |
Can find a specific query in the query store |
insert-table-data |
Can insert data into a table |
list-database-views |
Can list all database views |
list-queries |
Can list all queries in the query store |
persist-query |
Can persist a query in the query store |
re-execute-query |
Can re-execute a query to reproduce a result |
view-database-view-data |
Can view the data produced by a database view |
view-table-data |
Can view the data in a table |
view-table-history |
Can view the data history of a table |
Default Identifier Handling¶
Name | Description |
---|---|
create-identifier |
Can create an identifier (subset, database) |
find-identifier |
Can find a specific identifier |
list-identifier |
Can list all identifiers |
Default User Handling¶
Name | Description |
---|---|
modify-user-theme |
Can modify the user theme (light, dark) |
modify-user-information |
Can modify the user information |
Default Maintenance Handling¶
Name | Description |
---|---|
create-maintenance-message |
Can create a maintenance message banner |
delete-maintenance-message |
Can delete a maintenance message banner |
find-maintenance-message |
Can find a maintenance message banner |
list-maintenance-messages |
Can list all maintenance message banners |
update-maintenance-message |
Can update a maintenance message banner |
Default Semantics Handling¶
Name | Description |
---|---|
create-semantic-unit |
Can save a previously unknown unit for a table column |
create-semantic-concept |
Can save a previously unknown concept for a table column |
execute-semantic-query |
Can query remote SPARQL endpoints to get labels and description |
table-semantic-analyse |
Can automatically suggest units and concepts for a table |
Escalated User Handling¶
Name | Description |
---|---|
find-user |
Can list user information for a specific user |
Escalated Container Handling¶
Name | Description |
---|---|
delete-container |
Can delete any container |
modify-foreign-container-state |
Can modify any container state (start, stop) |
Escalated Database Handling¶
Name | Description |
---|---|
delete-database |
Can delete any database in any container |
Escalated Table Handling¶
Name | Description |
---|---|
delete-table |
Can delete any table in any database |
Escalated Query Handling¶
Name | Description |
---|---|
/ |
Escalated Identifier Handling¶
Name | Description |
---|---|
create-foreign-identifier |
Can create an identifier to any database or query |
delete-identifier |
Can delete any identifier |
modify-identifier-metadata |
Can modify any identifier metadata |
Escalated Semantics Handling¶
Name | Description |
---|---|
create-ontology |
Can register a new ontology |
delete-ontology |
Can unregister an ontology |
list-ontologies |
Can list all ontologies |
modify-foreign-table-column-semantics |
Can modify any table column concept and unit |
update-ontology |
Can update ontology metadata |
update-semantic-concept |
Can update own table column concept |
update-semantic-unit |
Can update own table column unit |
API¶
Obtain Access Token¶
Access tokens are needed for almost all operations.
curl -X POST \
-d "username=foo&password=bar&grant_type=password&client_id=dbrepo-client&scope=openid&client_secret=MUwRc7yfXSJwX8AdRMWaQC3Nep1VjwgG" \
http://localhost/api/auth/realms/dbrepo/protocol/openid-connect/token
import requests
auth = requests.post("http://localhost/api/auth/realms/dbrepo/protocol/openid-connect/token", data={
"username": "foo",
"password": "bar",
"grant_type": "password",
"client_id": "dbrepo-client",
"scope": "openid",
"client_secret": "MUwRc7yfXSJwX8AdRMWaQC3Nep1VjwgG"
})
print(auth.json()["access_token"])
Refresh Access Token¶
Using the response from above, a new access token can be created via the refresh token provided.
curl -X POST \
-d "grant_type=refresh_token&client_id=dbrepo-client&refresh_token=THE_REFRESH_TOKEN&client_secret=MUwRc7yfXSJwX8AdRMWaQC3Nep1VjwgG" \
http://localhost/api/auth/realms/dbrepo/protocol/openid-connect/token
import requests
auth = requests.post("http://localhost/api/auth/realms/dbrepo/protocol/openid-connect/token", data={
"grant_type": "refresh_token",
"client_id": "dbrepo-client",
"client_secret": "MUwRc7yfXSJwX8AdRMWaQC3Nep1VjwgG",
"refresh_token": "THE_REFRESH_TOKEN"
})
print(auth.json()["access_token"])
Broker Service¶
Debug Information
- Ports: 5672/tcp, 15672/tcp
- RabbitMQ Management Plugin:
http://:15672
- RabbitMQ Prometheus Plugin:
http://:15692/metrics
It holds exchanges and topics responsible for holding AMQP messages for later consumption. We
use RabbitMQ in the implementation. The AMQP endpoint listens to port 5672
for
regular declares and offers a management interface at port 15672
.
The default credentials are:
- Username:
fda
- Password:
fda
Container Service¶
Debug Information
- Ports: 9091/tcp
- Info:
http://:9091/actuator/info
- Health:
http://:9091/actuator/health
- Prometheus:
http://:9091/actuator/prometheus
- Swagger UI:
http://:9091/swagger-ui/index.html
view online
It is responsible for Docker container lifecycle operations and updating the local copy of the Docker images.
Database Service¶
Debug Information
- Ports: 9092/tcp
- Info:
http://:9092/actuator/info
- Health:
http://:9092/actuator/health
- Prometheus:
http://:9092/actuator/prometheus
- Swagger UI:
http://:9092/swagger-ui/index.html
view online
It creates the databases inside a Docker container and the Query Store. Currently, we only support MariaDB images that allow table versioning with low programmatic effort.
Gateway Service¶
Debug Information
- Ports: 9095/tcp
- Info:
http://:9095/actuator/info
- Health:
http://:9095/actuator/health
- Prometheus:
http://:9095/actuator/prometheus
Provides a single point of access to the application programming interface (API) and configures a standard NGINX reverse proxy for load balancing, SSL/TLS configuration.
Identifier Service¶
Debug Information
- Ports: 9096/tcp
- Info:
http://:9096/actuator/info
- Health:
http://:9096/actuator/health
- Prometheus:
http://:9096/actuator/prometheus
- Swagger UI:
http://:9096/swagger-ui/index.html
view online
This microservice is responsible for creating and resolving a persistent identifier (PID) attached to a query to obtain the metadata attached to it and allow re-execution of a query. We store both the query and hashes of the query and result set to allow equality checks of the originally obtained result set and the currently obtained result set. In the reference implementation we currently only use a numerical id column and plan to integrate digital object identifier (DOI) through our institutional library soon.
Metadata Database¶
Debug Information
- Ports: 3306/tcp, 9100/tcp
- Prometheus:
http://:9100/metrics
It is the core component of the project. It is a relational database that contains metadata about all researcher databases created in the database repository like column names, check expressions, value enumerations or key/value constraints and relevant data for citing data sets. Additionally, the concept, e.g. URI of units of measurements of numerical columns is stored in the Metadata Database in order to provide semantic knowledge context. We use MariaDB for its rich capabilities in the reference implementation.
The default credentials are root:dbrepo
for the database fda
. Connect to the database via the JDBC connector on
port 3306
.
Metadata Service¶
Debug Information
- Ports: 9099/tcp
- Info:
http://:9099/actuator/info
- Health:
http://:9099/actuator/health
- Prometheus:
http://:9099/actuator/prometheus
- Swagger UI:
http://:9099/swagger-ui/index.html
view online
This service provides an OAI-PMH endpoint for metadata crawler.
Query Service¶
Debug Information
- Ports: 9093/tcp
- Info:
http://:9093/actuator/info
- Health:
http://:9093/actuator/health
- Prometheus:
http://:9093/actuator/prometheus
- Swagger UI:
http://:9093/swagger-ui/index.html
view online
It provides an interface to insert data into the tables created by the Table Service. It also allows for view-only, paginated and versioned query execution to the raw data and consumes messages in the message queue from the Broker Service.
Search Database¶
Debug Information
- Ports: 9200/tcp
- Indexes:
http://:9200/_all
- Health:
http://:9200/_cluster/health/
It processes search requests from the Gateway Service for full-text lookups in the metadata database. We use Elasticsearch in the reference implementation. The search database implements Elastic Search and creates a retrievable index on all databases that is getting updated with each save operation on databases in the metadata database.
All requests need to be authenticated, by default the credentials elastic:elastic
are used.
Semantics Service¶
Debug Information
- Ports: 9097/tcp
- Info:
http://:9097/actuator/info
- Health:
http://:9097/actuator/health
- Prometheus:
http://:9097/actuator/prometheus
- Swagger UI:
http://:9097/swagger-ui/index.html
view online
It is designed to map terms in the domain of units of measurement to controlled vocabulary, modelled in the ontology of units of measure. This service validates researcher provided in units and provides a uniform resource identifier (URI) to the related concept, which will be stored in the system. Furthermore, there is a method for auto-completing text and listing a description as well as commonly used unit symbols.
Table Service¶
Debug Information
- Ports: 9094/tcp
- Info:
http://:9094/actuator/info
- Health:
http://:9094/actuator/health
- Prometheus:
http://:9094/actuator/prometheus
- Swagger UI:
http://:9094/swagger-ui/index.html
view online
This microservice handles table operations inside a database that is managed by the Database Service. We use Hibernate for schema and data ingest operations.
UI¶
Debug Information
- Ports: 3000/tcp, 9100/tcp
- Prometheus:
http://:9100/metrics
- UI:
http://:3000/
It provides a graphical user interface (GUI) for a researcher to interact with the database repository's API.
User Service¶
Debug Information
- Ports: 9098/tcp
- Info:
http://:9098/actuator/info
- Health:
http://:9098/actuator/health
- Prometheus:
http://:9098/actuator/prometheus
- Swagger UI:
http://:9098/swagger-ui/index.html
view online
This microservice handles user information.