diff --git a/.docs/api/analyse-service.md b/.docs/api/analyse-service.md index 484271bbfe75062897c6a7a2a4497e084337f3e1..fe45e9492c4a7c53c024603690132e2dfa5aeec9 100644 --- a/.docs/api/analyse-service.md +++ b/.docs/api/analyse-service.md @@ -6,7 +6,7 @@ author: Martin Weise !!! debug "Debug Information" - Image: [`dbrepo/analyse-service:__APPVERSION__`](https://hub.docker.com/r/dbrepo/analyse-service) + Image: [`registry.datalab.tuwien.ac.at/dbrepo/analyse-service:1.4.4`](https://hub.docker.com/r/dbrepo/analyse-service) * Ports: 5000/tcp * Prometheus: `http://<hostname>:5000/metrics` @@ -15,37 +15,37 @@ author: Martin Weise ## Overview -It suggests data types for the [User Interface](./system-other-ui) when creating a table from a -*comma separated values* (CSV) -file. It recommends enumerations for columns and returns e.g. a list of potential +It suggests data types for the [User Interface](../ui) when creating a table from a +*comma separated values* (CSV) -file. It recommends enumerations for columns and returns e.g. a list of potential primary key candidates. The researcher is able to confirm these suggestions manually. Moreover, the Analyse Service determines basic statistical properties of numerical columns. ### Analysis -After [uploading](./system-services-storage/#buckets) the CSV-file into the `dbrepo-upload` bucket of -the [Storage Service](./system-services-storage), analysis for data types and primary keys follows the flow: - -1. Retrieve the CSV-file from the `dbrepo-upload` bucket of the Storage Service as data stream (=nothing is stored in +After [uploading](../storage-service/#buckets) the CSV-file into the `dbrepo-upload` bucket of +the [Storage Service](../storage-service), analysis for data types and primary keys follows the flow: + +1. Retrieve the CSV-file from the `dbrepo-upload` bucket of the Storage Service as data stream (=nothing is stored in the service) with the [`boto3`](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) client. -2. When no separator is known, the Analyse Service tries to guess the separator from the first line +2. When no separator is known, the Analyse Service tries to guess the separator from the first line with [`csv.Sniff().sniff(...)`](https://docs.python.org/3/library/csv.html#csv.Sniffer). This step is optional when the separator was provided via HTTP-payload: `{"separator": ";", ...}` -3. With the separator known (either from step 2 or via HTTP-payload), - the [`messytables.CSVTableSet(...)`](https://messytables.readthedocs.io/en/latest/#csv-support) guesses the headers - and column types and enums, if the HTTP-payload contains `{"enum": true, ...}`. +3. With the separator known (either from step 2 or via HTTP-payload), the [`Pandas`](https://pypi.org/project/pandas/) + guesses the headers and column types and enums, if the HTTP-payload contains `{"enum": true, ...}`. The data type + is guessed by a combination of Pandas and heuristics. ### Examples -See the [usage page](./usage-analyse/) for examples. +See the [usage page](..) for examples. ## Limitations !!! question "Do you miss functionality? Do these limitations affect you?" We strongly encourage you to help us implement it as we are welcoming contributors to open-source software and get - in [contact](./contact) with us, we happily answer requests for collaboration with attached CV and your programming + in [contact](../../contact) with us, we happily answer requests for collaboration with attached CV and your programming experience! ## Security -1. Credentials for the [Storage Service](./system-services-storage) are stored in plaintext environment variables. +1. Credentials for the [Storage Service](../storage-service) are stored in plaintext environment variables. diff --git a/.docs/api/auth-service.md b/.docs/api/auth-service.md index 5d3e0f42b2bb19b28451c8a8c8e40d937ffe9fab..35c715fc1b6b6e16734a6059a9eaf388e53bb3ad 100644 --- a/.docs/api/auth-service.md +++ b/.docs/api/auth-service.md @@ -6,17 +6,23 @@ author: Martin Weise !!! debug "Debug Information" - Image: [`dbrepo/authentication-service:__APPVERSION__`](https://hub.docker.com/r/dbrepo/authentication-service) + Image: [`registry.datalab.tuwien.ac.at/dbrepo/authentication-service:1.4.4`](https://hub.docker.com/r/dbrepo/authentication-service) * Ports: 8080/tcp - * UI: `http://<hostname>/api/auth/admin/` + * UI: `http://<hostname>/api/auth/` ## Overview -By default, users are created using the [User Interface](../system-other-ui) and the sign-up page in the User Interface. -This creates a new user in the [Authentication Database](../system-databases-authentication), the user identity is then -managed by the -Authentication Service. +By default, users are created using the [User Interface](../ui) and the sign-up page in the User Interface. +This creates a new user in Keycloak. The user identity is then managed by the Auth Service. Only a very small subset +of immutable properties (id, username) is mirrored in the [Metadata Database](../metadata-db) for faster access. + +## Identities + +:octicons-tag-16:{ title="Minimum version" } 1.4.4 + +Identities can also be added in Keycloak directly. When requesting a JWT token from the `/api/user` endpoint, the +immutable properties mentioned in c.f. [Overview](#overview) are copied transparent to the user on first login. ## Groups @@ -41,163 +47,16 @@ Each of the composite role has a set of other associated composite roles. </figure> There is one role for one specific action in the services. For example: the `create-database` role authorizes a user to -create a database in a Docker container. Therefore, -the [`DatabaseEndpoint.java`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/blob/a5bdd1e2169bae6497e2f7eee82dad8b9b059850/fda-database-service/rest-service/src/main/java/at/tuwien/endpoints/DatabaseEndpoint.java#L78) -endpoint requires a JWT access token with this authority. - -```java -@PostMapping -@PreAuthorize("hasAuthority('create-database')") -public ResponseEntity<DatabaseBriefDto> create(@NotNull Long containerId, - @Valid @RequestBody DatabaseCreateDto createDto, - @NotNull Principal principal) { -... -} -``` - -### Default Container Handling - -| Name | Description | -|-------------------|-------------------------------| -| `find-container` | Can find a specific container | -| `list-containers` | Can list all containers | - -### Default Database Handling - -| Name | Description | -|------------------------------|------------------------------------------------------| -| `check-database-access` | Can check the access to a database of a user | -| `create-database` | Can create a database | -| `create-database-access` | Can give a new access to a database of a user | -| `delete-database-access` | Can delete the access to a database of a user | -| `find-database` | Can find a specific database in a container | -| `list-databases` | Can list all databases in a container | -| `modify-database-image` | Can update the database image | -| `modify-database-owner` | Can modify the database owner | -| `modify-database-visibility` | Can modify the database visibility (public, private) | -| `update-database-access` | Can update the access to a database of a user | - -### Default Table Handling - -| Name | Description | -|---------------------------------|------------------------------------------------------| -| `create-table` | Can create a table | -| `find-tables` | Can list a specific table in a database | -| `list-tables` | Can list all tables | -| `modify-table-column-semantics` | Can modify the column semantics of a specific column | -| `delete-table` | Can delete tables owned by the user in a database | - -### Default Query Handling - -| Name | Description | -|---------------------------|-----------------------------------------------| -| `create-database-view` | Can create a view in a database | -| `delete-database-view` | Can delete a view in a database | -| `delete-table-data` | Can delete data in a table | -| `execute-query` | Can execute a query statement | -| `export-query-data` | Can export the data that a query has produced | -| `export-table-data` | Can export the data stored in a table | -| `find-database-view` | Can find a specific database view | -| `find-query` | Can find a specific query in the query store | -| `insert-table-data` | Can insert data into a table | -| `list-database-views` | Can list all database views | -| `list-queries` | Can list all queries in the query store | -| `persist-query` | Can persist a query in the query store | -| `re-execute-query` | Can re-execute a query to reproduce a result | -| `view-database-view-data` | Can view the data produced by a database view | -| `view-table-data` | Can view the data in a table | -| `view-table-history` | Can view the data history of a table | - -### Default Identifier Handling - -| Name | Description | -|---------------------|---------------------------------------------| -| `create-identifier` | Can create an identifier (subset, database) | -| `find-identifier` | Can find a specific identifier | -| `list-identifier` | Can list all identifiers | - -### Default User Handling - -| Name | Description | -|---------------------------|-----------------------------------------| -| `modify-user-theme` | Can modify the user theme (light, dark) | -| `modify-user-information` | Can modify the user information | - -### Default Maintenance Handling - -| Name | Description | -|------------------------------|------------------------------------------| -| `create-maintenance-message` | Can create a maintenance message banner | -| `delete-maintenance-message` | Can delete a maintenance message banner | -| `find-maintenance-message` | Can find a maintenance message banner | -| `list-maintenance-messages` | Can list all maintenance message banners | -| `update-maintenance-message` | Can update a maintenance message banner | - -### Default Semantics Handling - -| Name | Description | -|---------------------------|-----------------------------------------------------------------| -| `create-semantic-unit` | Can save a previously unknown unit for a table column | -| `create-semantic-concept` | Can save a previously unknown concept for a table column | -| `execute-semantic-query` | Can query remote SPARQL endpoints to get labels and description | -| `table-semantic-analyse` | Can automatically suggest units and concepts for a table | - -### Escalated User Handling - -| Name | Description | -|-------------|-----------------------------------------------| -| `find-user` | Can list user information for a specific user | - -### Escalated Container Handling - -| Name | Description | -|--------------------|--------------------------| -| `create-container` | Can create a container | -| `delete-container` | Can delete any container | - -### Escalated Database Handling - -| Name | Description | -|-------------------|------------------------------------------| -| `delete-database` | Can delete any database in any container | - -### Escalated Table Handling - -| Name | Description | -|------------------------|--------------------------------------| -| `delete-foreign-table` | Can delete any table in any database | - -### Escalated Query Handling - -| Name | Description | -|------|-------------| -| / | | - -### Escalated Identifier Handling - -| Name | Description | -|------------------------------|---------------------------------------------------| -| `create-foreign-identifier` | Can create an identifier to any database or query | -| `delete-identifier` | Can delete any identifier | -| `modify-identifier-metadata` | Can modify any identifier metadata | - -### Escalated Semantics Handling - -| Name | Description | -|-----------------------------------------|----------------------------------------------| -| `create-ontology` | Can register a new ontology | -| `delete-ontology` | Can unregister an ontology | -| `list-ontologies` | Can list all ontologies | -| `modify-foreign-table-column-semantics` | Can modify any table column concept and unit | -| `update-ontology` | Can update ontology metadata | -| `update-semantic-concept` | Can update own table column concept | -| `update-semantic-unit` | Can update own table column unit | +create a database. + +A full list of available roles can be obtained +from [`dbrepo-realm.json`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/blob/fb8d14ba02ee32b9a69a30905437b5c9e28adc21/dbrepo-auth-service/dbrepo-realm.json#L46) +which is imported into Keycloak on startup. ## Limitations * No support for sending e-mails through Keycloak by default. * No support for temporary passwords. -* No support for adding identifies in Keycloak directly. * No support for multi-factor authentication. !!! question "Do you miss functionality? Do these limitations affect you?" @@ -208,5 +67,5 @@ public ResponseEntity<DatabaseBriefDto> create(@NotNull Long containerId, ## Security -1. Mount your TLS certificate / private key pair into `/app/tls.crt` and `/app/tls.key` and - set `KC_HTTPS_CERTIFICATE_FILE=/app/tls.crt` and set `KC_HTTPS_CERTIFICATE_KEY_FILE=/app/tls.key`. +1. Keycloak should be configured to use TLS certificates, follow + the [official documentation](https://www.keycloak.org/server/enabletls). diff --git a/.docs/api/data-db.md b/.docs/api/data-db.md index c91d230be7776ecaa904513396895e4d6725781c..3b2738f981eefd2749b95a40a347951ea9a0a39c 100644 --- a/.docs/api/data-db.md +++ b/.docs/api/data-db.md @@ -4,7 +4,7 @@ author: Martin Weise !!! debug "Debug Information" - Image: [`bitnami/mariadb-galera:11.2.2-debian-11-r0`](https://hub.docker.com/r/bitnami/mariadb-galera) + Image: [`docker.io/bitnami/mariadb:11.1.3-debian-11-r6`](https://hub.docker.com/r/bitnami/mariadb) * Ports: 3306/tcp * JDBC: `jdbc://mariadb:<hostname>:3306` @@ -17,20 +17,26 @@ author: Martin Weise ## Overview -By default, only one Data Database is deployed. You can deploy multiple (different) Data Database instances and make -them available in the repository as follows: +The Data Database contains the research data. In the default configuration, only one database of this type is deployed. +Any number of MariaDB ata databases can be integrated into DBRepo, even non-empty databases. The database needs to be +registered in the Metadata Database to be visible in the [User Interface](../ui) and usable from e.g. the Python +Library. -=== "Terminal" +## Architecture - ```shell - curl \ - -sSL \ - http://<hostname>/api/container \ - -X POST \ - -d '{"name": "Data Database 2", "imageId": 1, "host": "example.com", "port": 3306, "privilegedUsername": "root", "privilegedPassword": "s3cr3t" }' - ``` +### Sidecar + +We deploy a sidecar that handles the CSV-file upload/download operations between +the [Storage Service](../system-services-storage) and the Data Database using a Python Flask application and +the [`boto3`](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) client until MariaDB supports S3 +natively. + +<figure markdown> + +<figcaption>Sidecar that handles the CSV-file upload/download.</figcaption> +</figure> -### Settings +## Data The procedures require the user-generated databases to have the same collation (because of comparison operations). Ensure that the Data Database has the character set `utf8mb4` and collation `utf8mb4_general_ci` in your `my.cfg`: @@ -51,18 +57,6 @@ mariadb-galera: extraFlags: "--character-set-server=utf8mb4 --collation-server=utf8mb4_general_ci" ``` -### Sidecar - -We deploy a sidecar that handles the CSV-file upload/download operations between -the [Storage Service](../system-services-storage) and the Data Database using a Python Flask application and -the [`boto3`](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) client until MariaDB supports S3 -natively. - -<figure markdown> - -<figcaption>Sidecar that handles the CSV-file upload/download.</figcaption> -</figure> - ### Backup Export all databases with `--skip-lock-tables` option for MariaDB Galera clusters as it is not supported currently by diff --git a/.docs/api/data-service.md b/.docs/api/data-service.md index 41efb2151420a4507d6d0e15e6df5e8be7486984..ab64c50d31dd29597f7c4bff956bee3cfeb66814 100644 --- a/.docs/api/data-service.md +++ b/.docs/api/data-service.md @@ -6,7 +6,7 @@ author: Martin Weise !!! debug "Debug Information" - Image: [`dbrepo/data-service:__APPVERSION__`](https://hub.docker.com/r/dbrepo/data-service) + Image: [`registry.datalab.tuwien.ac.at/dbrepo/data-service:1.4.4`](https://hub.docker.com/r/dbrepo/data-service) * Ports: 9093/tcp * Info: `http://<hostname>:9093/actuator/info` @@ -27,7 +27,7 @@ Data Service up. !!! question "Do you miss functionality? Do these limitations affect you?" We strongly encourage you to help us implement it as we are welcoming contributors to open-source software and get - in [contact](./contact) with us, we happily answer requests for collaboration with attached CV and your programming + in [contact](../../contact) with us, we happily answer requests for collaboration with attached CV and your programming experience! ## Security diff --git a/.docs/api/gateway-service.md b/.docs/api/gateway-service.md index cd3be4f73dd8f4891513615f7b901c055f71fed5..923b95a9f30ac9af06bb029682bd67bc7f2f0961 100644 --- a/.docs/api/gateway-service.md +++ b/.docs/api/gateway-service.md @@ -6,21 +6,21 @@ author: Martin Weise !!! debug "Debug Information" - Image: [`nginx:1.25-alpine-slim`](https://hub.docker.com/r/nginx) + Image: [`docker.io/nginx:1.27.0-alpine3.19-slim`](https://hub.docker.com/r/nginx) * Ports: 80/tcp ## Overview Provides a single point of access to the *application programming interface* (API) and configures a -standard [NGINX](https://www.nginx.com/) reverse proxy for load balancing. This component is optional if you already have a load balancer -or reverse proxy running. +standard [NGINX](https://www.nginx.com/) reverse proxy for load balancing. This component is optional if you already +have a load balancer or reverse proxy running. ## Settings ### SSL/TLS Security -To setup SSL/TLS encryption, mount your TLS certificate and TLS private key into the container directly into the +To setup SSL/TLS encryption, mount your TLS certificate and TLS private key into the container directly into the `/etc/nginx/` directory. ```yaml title="docker-compose.yml" @@ -41,14 +41,14 @@ If your TLS private key as a password, you need to specify it in the `dbrepo.con ### User Interface -To serve the [User Interface](./system-other-ui/) under different port than `80`, change the port mapping in +To serve the [User Interface](../ui/) under different port than `80`, change the port mapping in the `docker-compose.yml` to e.g. port `8000`: ```yaml title="docker-compose.yml" services: ... dbrepo-gateway-service: - image: docker.io/nginx:1.25-alpine-slim + image: docker.io/nginx:1.27.0-alpine3.19-slim ports: - "8000:80" ... @@ -61,13 +61,12 @@ services: !!! question "Do you miss functionality? Do these limitations affect you?" We strongly encourage you to help us implement it as we are welcoming contributors to open-source software and get - in [contact](./contact) with us, we happily answer requests for collaboration with attached CV and your programming + in [contact](../../contact) with us, we happily answer requests for collaboration with attached CV and your programming experience! - ## Security -1. Enable TLS encryption by downloading +1. Enable TLS encryption by downloading the [`dbrepo.conf`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/raw/master/dbrepo-gateway-service/dbrepo.conf) and editing the *server* block to include your TLS certificate (with trust chain) `fullchain.pem` and TLS private key `privkey.pem` (PEM-encoded). diff --git a/.docs/api/metadata-db.md b/.docs/api/metadata-db.md index 38cbe3f127f0cbddaac7c8eda6f9a448a549d8f6..f5cc4d84ccefa303bce99af21d3a8116895946ba 100644 --- a/.docs/api/metadata-db.md +++ b/.docs/api/metadata-db.md @@ -4,7 +4,7 @@ author: Martin Weise !!! debug "Debug Information" - Image: [`bitnami/mariadb-galera:11.2.2-debian-11-r0`](https://hub.docker.com/r/bitnami/mariadb-galera) + Image: [`docker.io/bitnami/mariadb:11.1.3-debian-11-r6`](https://hub.docker.com/r/bitnami/mariadb) * Ports: 3306/tcp * JDBC: `jdbc://mariadb:<hostname>:3306` @@ -23,11 +23,12 @@ services: dbrepo-metadata-db: ... volumes: - - /path/to/setup-some-data.sql:/docker-entrypoint-initdb.d/setup-some-data.sql + - /path/to/setup-schema.sql:/docker-entrypoint-initdb.d/1_setup-schema.sql + - /path/to/setup-data.sql:/docker-entrypoint-initdb.d/2_setup-data.sql ... ``` !!! warning "Alphabetic Filename Sorting" Beware that the init script provided by Bitnami executes files in alphabetic order! For example: the file - `setup-schema.sql` is executed **after** the file `setup-data.sql`! \ No newline at end of file + `setup-schema.sql` is executed **after** the file `setup-data.sql`! Thefore a sorting prefix 1-9 is recommended! \ No newline at end of file diff --git a/.docs/api/metadata-service.md b/.docs/api/metadata-service.md index 362a9c36bcf6a32ba8262d002716e108d998b2be..fa365219cc6d2c528197fc39354f5410e828166c 100644 --- a/.docs/api/metadata-service.md +++ b/.docs/api/metadata-service.md @@ -6,7 +6,7 @@ author: Martin Weise !!! debug "Debug Information" - Image: [`dbrepo/metadata-service:__APPVERSION__`](https://hub.docker.com/r/dbrepo/metadata-service) + Image: [`registry.datalab.tuwien.ac.at/dbrepo/metadata-service:1.4.4`](https://hub.docker.com/r/dbrepo/metadata-service) * Ports: 9099/tcp * Info: `http://<hostname>:9099/actuator/info` @@ -14,45 +14,39 @@ author: Martin Weise - Readiness: `http://<hostname>:9099/actuator/health/readiness` - Liveness: `http://<hostname>:9099/actuator/health/liveness` * Prometheus: `http://<hostname>:9099/actuator/prometheus` - * Swagger UI: `http://<hostname>:9099/swagger-ui/index.html` <a href="./swagger/metadata" target="_blank">:fontawesome-solid-square-up-right: view online</a> + * Swagger UI: `http://<hostname>:9099/swagger-ui/index.html` ## Overview -This service manages the following topics: +The metadata service manages metadata of identities, the [Broker Service](../broker-service) (i.e. obtaining queue +types), semantic concepts (i.e. ontologies) and relational metadata (databases, tables, queries, views) and identifiers. -* Databases -* Identifiers (DataCite, OAI-PMH) -* Queries -* Semantics (Ontologies) -* Tables -* Users -* Views +## Generation -### Databases +Most of the metadata available in DBRepo is generated automatically, leveraging the available information and taking +the burden away from researchers, data stewards, etc. For example, the schema (names, constraints, data length) of +generated tables and views is obtained from the `information_schema` database maintained by MariaDB internally. -The service handles table operations inside a database. We use [Hibernate](https://hibernate.org/orm/) for schema and -data ingest operations. - -### Identifiers +## Identifiers The service is responsible for creating and resolving a *persistent identifier* (PID) attached to a database, subset, table or view to obtain the metadata attached to it and allow reproduction of the exact same result. -This service also provides an OAI-PMH endpoint for metadata aggregators +This service also provides an OAI-PMH endpoint for metadata aggregators (e.g. [OpenAIRE Graph](https://graph.openaire.eu/)). Through the User Interface, it also exposes metadata through JSON-LD to metadata aggregators (e.g. [Google Datasets](https://datasetsearch.research.google.com/)). PID metadata is always exposed, even for private databases. -The service generates internal PIDs, essentially representing internal URIs in -the [DataCite Metadata Schema 4.4](https://doi.org/10.14454/3w3z-sa82). This can be enhanced with activating the -external DataCite Fabrica system to generate DOIs, this is disabled by default. +The service generates internal PIDs, essentially representing internal URIs in +the [DataCite Metadata Schema 4.4](https://doi.org/10.14454/3w3z-sa82). This can be enhanced with activating the +external DataCite Fabrica system to generate DOIs, this is disabled by default. To activate DOI minting, pass your DataCite Fabrica credentials in the environment variables: ```yaml title="docker-compose.yml" services: dbrepo-metadata-service: - image: docker.io/dbrepo/metadata-service:1.4.0 + image: registry.datalab.tuwien.ac.at/dbrepo/metadata-service:1.4.4 environment: spring_profiles_active: doi DATACITE_URL: https://api.datacite.org @@ -62,72 +56,13 @@ services: ... ``` -### Queries - -It provides an interface to insert data into the tables. It also allows for view-only, paginated and versioned query -execution to the raw data. Any stale queries (query that have been executed by users in DBRepo but were not saved) are -periodically being deleted from the query store based on the `DELETE_STALE_QUERIES_RATE` environment variable (defaults -to 60 seconds). - -Executing SQL queries through the Query Endpoint must fulfill some restrictions: +## Semantics -* The SQL query does not contain at semicolon `;` - -### Semantics - -The service provides metadata to the table columns in the [Metadata Database](./system-databases-metadata) from -registered ontologies like Wikidata [`wd:`](https://wikidata.org), Ontology of Units of +The service provides metadata to the table columns in the [Metadata Database](../metadata-db) fromregistered ontologies +like Wikidata [`wd:`](https://wikidata.org), Ontology of Units of Measurement [`om2:`](https://www.ontology-of-units-of-measure.org/resource/om-2), Friend of a Friend [`foaf:`](http://xmlns.com/foaf/0.1/), the [`prov:`](http://www.w3.org/ns/prov#) namespace, etc. -### Tables - -The service manages tables in the [Data Database](./system-databases-data) and manages the metadata of these tables -in the [Metadata Database](./system-databases-metadata). Any tables that are created outside of DBRepo (e.g. directly via the JDBC API) are -periodically fetched by this service (based on the `OBTAIN_METADATA_RATE` environment variable, default interval is 60 -seconds). - -### Users - -The service manages users in the [Data Database](./system-databases-data) -and [Metadata Database](./system-databases-metadata), as well as in the [Broker Service](./system-services-broker) -and the [Authentication Service](./system-services-authentication). - -The default configuration grants the users only very basic permissions on the databases: - -* `SELECT` -* `CREATE` -* `CREATE VIEW` -* `CREATE ROUTINE` -* `CREATE TEMPORARY TABLES` -* `LOCK TABLES` -* `INDEX` -* `TRIGGER` -* `INSERT` -* `UPDATE` -* `DELETE` - -This configuration is passed as environment variable `GRANT_PRIVILEGES` to the service as comma-separated string. You -can add/remove grants by setting this environment variable, e.g. allow the users to only select data and create -temporary tables: - -```yaml title="docker-compose.yml" -services: - dbrepo-metadata-service: - environment: - GRANT_PRIVILEGES=SELECT,CREATE TEMPORARY TABLES - ... -``` - -A list of all grants is available in the MariaDB documentation for [`GRANT`](https://mariadb.com/kb/en/grant/) - -### Views - -The service manages views in the [Data Database](./system-databases-data) -and [Metadata Database](./system-databases-metadata). Any views that are created outside of DBRepo (e.g. directly via -the JDBC API) are periodically fetched by this service (based on the `OBTAIN_METADATA_RATE` environment variable, -default interval is 60 seconds). - ## Limitations * No support for other databases than [MariaDB](https://mariadb.org/) because of system-versioning capabilities missing @@ -136,7 +71,7 @@ default interval is 60 seconds). !!! question "Do you miss functionality? Do these limitations affect you?" We strongly encourage you to help us implement it as we are welcoming contributors to open-source software and get - in [contact](./contact) with us, we happily answer requests for collaboration with attached CV and your programming + in [contact](../../contact) with us, we happily answer requests for collaboration with attached CV and your programming experience! ## Security diff --git a/.docs/api/python.md b/.docs/api/python.md index a48fb53d8d474b9643d92f5e17449a92d2d1cd93..ab6b2b69a6b6ca9606232eca534686bc6c0b71ab 100644 --- a/.docs/api/python.md +++ b/.docs/api/python.md @@ -8,6 +8,13 @@ author: Martin Weise [:fontawesome-solid-cube: View Docs](../../python){ .md-button .md-button--primary } +## Overview + +The DBRepo Python library is using some of the most pupular and maintained Python packages for Data Scientists under the +hood. For example: [`requests`](https://requests.readthedocs.io/) to interact with the HTTP API +endpoints, [`pandas`](https://pandas.pydata.org/) for data operations and [`pydantic`](https://docs.pydantic.dev/) for +information representation from/to the HTTP API. + ## Installing :octicons-tag-16:{ title="Minimum version" } 1.4.2 diff --git a/.docs/api/search-service.md b/.docs/api/search-service.md index fff317d6f8adc093cdf6f725bab31f5bbdb424e7..b48be919d6acec14bbfb3783c6e085f2bcf3e1e9 100644 --- a/.docs/api/search-service.md +++ b/.docs/api/search-service.md @@ -6,18 +6,17 @@ author: Martin Weise !!! debug "Debug Information" - Image: [`dbrepo/search-service:__APPVERSION__`](https://hub.docker.com/r/dbrepo/search-service) + Image: [`registry.datalab.tuwien.ac.at/dbrepo/search-service:1.4.4`](https://hub.docker.com/r/dbrepo/search-service) * Ports: 4000/tcp * Health: `http://<hostname>:4000/api/search/health` * Prometheus: `http://<hostname>:4000/metrics` - * Swagger UI: `http://<hostname>:4000/swagger-ui/` <a href="../swagger/search" target="_blank">:fontawesome-solid-square-up-right: view online</a> + * Swagger UI: `http://<hostname>:4000/swagger-ui/` ## Overview -This service communicates between the [Search Database](../system-databases-search) and -the [User Interface](../system-other-ui) to allow structured search of databases, tables, columns, users, identifiers, -views, semantic concepts & units of measurements used in databases. +This service communicates between the Search Database and the [User Interface](../ui) to allow structured search of +databases, tables, columns, users, identifiers, views, semantic concepts & units of measurements used in databases. <figure markdown> { .img-border } @@ -26,9 +25,9 @@ views, semantic concepts & units of measurements used in databases. ## Index -There is only one +There is only one index [`database`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/raw/dev/dbrepo-search-db/init/indices/database.json) -that holds all the metadata information which is mirrored from the [Metadata Database](../system-databases-metadata). +that holds all the metadata information which is mirrored from the [Metadata Database](../metadata-db). <figure markdown>  @@ -37,44 +36,16 @@ that holds all the metadata information which is mirrored from the [Metadata Dat ## Faceted Browsing -This service enables the frontend to search the `database` index with eight different *types* of desired results +This service enables the frontend to search the `database` index with eight different *types* of desired results (database, table, column, view, identifier, user, concept, unit) and their *facets*. -For example, the [User Interface](../system-other-ui) allows for the search of databases that contain a certain -semantic concept (provided as URI, e.g. -temperature [http://www.wikidata.org/entity/Q11466](http://www.wikidata.org/entity/Q11466)) and unit of measurement -(provided as URI, e.g. degree +For example, the [User Interface](../ui) allows for the search of databases that contain a certain +semantic concept (provided as URI, e.g. +temperature [http://www.wikidata.org/entity/Q11466](http://www.wikidata.org/entity/Q11466)) and unit of measurement +(provided as URI, e.g. degree Celsius [http://www.ontology-of-units-of-measure.org/resource/om-2/degreeCelsius](http://www.ontology-of-units-of-measure.org/resource/om-2/degreeCelsius)). -An example on faceted browsing is found in the [usage examples](../usage-search). - -## Unit Independent Search - -Since the repository automatically collects statistical properties (min, max, mean, median, std.dev) in both the -[Metadata Database](../system-databases-metadata) and the [Search Database](../system-databases-search), a special -search can be performed when at least two columns have the same semantic concept (e.g. temperature) annotated and -the units of measurements can be transformed. - -<figure markdown> - -<figcaption>Figure 3: Two tables with compatible semantic concepts and units of measurement</figcaption> -</figure> - -In short, the search service transforms the statistical properties not in the target unit of measurements is transformed -by using the [`omlib`](https://github.com/dieudonneWillems/OMLib) package. - -For example: a user wants to find datasets that contain *"temperature measurements between 0 - 10 °C"*. Then the -search service transforms the query to the dataset on the right from °F to contain *"temperature measurements -between 32 - 50 °F"* instead. - -<figure markdown> - -<figcaption>Figure 4: Unit independent search query transformation</figcaption> -</figure> - -## Examples - -View [usage examples](../usage-search/). +An example on faceted browsing is found in the [usage examples](..). ## Limitations @@ -86,4 +57,4 @@ View [usage examples](../usage-search/). ## Security -(nothing) +(none) diff --git a/.docs/api/storage-service.md b/.docs/api/storage-service.md index bf40ca83c8cfde0951a3796df0fbb06e0e486478..a8da4f0721f50cdbc515ca9b0ed240418529f269 100644 --- a/.docs/api/storage-service.md +++ b/.docs/api/storage-service.md @@ -6,7 +6,7 @@ author: Martin Weise !!! debug "Debug Information" - Image: [`chrislusf/seaweedfs:3.59`](https://hub.docker.com/r/chrislusf/seaweedfs) + Image: [`docker.io/chrislusf/seaweedfs:3.59`](https://hub.docker.com/r/chrislusf/seaweedfs) * Ports: 9000/tcp * Prometheus: `http://<hostname>:9091/metrics` @@ -36,7 +36,7 @@ The default configuration creates two buckets `dbrepo-upload`, `dbrepo-download` !!! question "Do you miss functionality? Do these limitations affect you?" We strongly encourage you to help us implement it as we are welcoming contributors to open-source software and get - in [contact](./contact) with us, we happily answer requests for collaboration with attached CV and your programming + in [contact](../../contact) with us, we happily answer requests for collaboration with attached CV and your programming experience! ## Security diff --git a/.docs/api/ui.md b/.docs/api/ui.md index 2acc439097baa4374288c841e2976bb70e592e5c..d187772ce0d0e4c79e20565dd2e9732d051d0382 100644 --- a/.docs/api/ui.md +++ b/.docs/api/ui.md @@ -2,6 +2,14 @@ author: Martin Weise --- +## tl;dr + +!!! debug "Debug Information" + + Image: [`registry.datalab.tuwien.ac.at/dbrepo/ui:1.4.4`](https://hub.docker.com/r/dbrepo/ui) + + * Ports: 3000/tcp + The User Interface is configured in the `runtimeConfig` section of the `nuxt.config.ts` file during build time. For the runtime, you need to override those values through environment variables or by mounting a `.env` file. As a small example, you can configure the logo :material-numeric-1-circle-outline: in Figure 2. Make sure you mount the logo as @@ -27,7 +35,7 @@ if you use a Kubernetes deployment via ConfigMap and Volumes). ```yaml title="docker-compose.yml" services: dbrepo-ui: - image: docker.io/dbrepo/ui:__APPVERSION__ + image: registry.datalab.tuwien.ac.at/dbrepo/ui:1.4.4 volumes: - ./my_logo.png:/app/.output/public/my_logo.png ... diff --git a/.docs/api/upload-service.md b/.docs/api/upload-service.md index 88812d308ba856a1d7c77e65a1bf97298cb2e968..f8ad58ebcb1f626aa6064db682d3fffcd958c81a 100644 --- a/.docs/api/upload-service.md +++ b/.docs/api/upload-service.md @@ -6,46 +6,31 @@ author: Martin Weise !!! debug "Debug Information" - Image: [`tusproject/tusd:v1.12`](https://hub.docker.com/r/tusproject/tusd) + Image: [`docker.io/tusproject/tusd:v1.12`](https://hub.docker.com/r/tusproject/tusd) * Ports: 1080/tcp * Prometheus: `http://<hostname>:1080/api/upload/metrics` * API: `http://<hostname>:1080/api/upload` - * Swagger UI: <a href="../swagger/upload" target="_blank">:fontawesome-solid-square-up-right: view online</a> ## Overview -We use the [TUS](https://tus.io/) open protocol for resumable file uploads which based entirely on HTTP. Even though +We use the [TUS](https://tus.io/) open protocol for resume-able file uploads which based entirely on HTTP. Even though the Upload Service is part of the standard installation, it is an entirely optional component and can be replaced with any S3-compatible Blob Storage. -### Settings - -The Upload Service is responsible for uploading files (mainly CSV-files) into a Blob Storage that can be accesses trough -the S3 protocol (e.g. our [Storage Service](../system-services-storage)). Make sure that the Upload Service can be -accessed from the Gateway Service and set the url in the User Interface configuration file. - -```json title="dbrepo.config.json" -{ - "upload": { - "url": "example.com", - "useSsl": true - }, - ... -} -``` - -If your deployment is secured with SSL/TLS (recommended) set the `useSsl` variable to `true`. - ### Architecture -The Upload Service communicates internally with the [Storage Service](../system-services-storage) (c.f. [Figure 1](#fig1)). +The Upload Service communicates internally with the [Storage Service](../storage-service) (c.f. [Figure 1](#fig1)). <figure id="fig1" markdown>  <figcaption>Figure 1: Architecture of the Upload Service</figcaption> </figure> +The Upload Service is responsible for uploading files (mainly CSV-files) into a Blob Storage that can be accesses trough +the S3 protocol (e.g. our [Storage Service](../storage-service)). Make sure that the Upload Service can be +accessed from the Gateway Service. + ## Limitations * No support for authentication. diff --git a/.docs/concepts/search.md b/.docs/concepts/search.md index 8ae41c80ba2566dce997eed8ac4052b0cfa23631..8731100f91246b2f56091af373962d30d76796e6 100644 --- a/.docs/concepts/search.md +++ b/.docs/concepts/search.md @@ -4,10 +4,34 @@ author: Martin Weise ## Index -TBD +tbd ## Document TBD -## Query \ No newline at end of file +## Query + +## Unit Independent Search + +Since the repository automatically collects statistical properties (min, max, mean, median, std.dev) in both the +[Metadata Database](../system-databases-metadata) and the [Search Database](../system-databases-search), a special +search can be performed when at least two columns have the same semantic concept (e.g. temperature) annotated and +the units of measurements can be transformed. + +<figure markdown> + +<figcaption>Figure 3: Two tables with compatible semantic concepts and units of measurement</figcaption> +</figure> + +In short, the search service transforms the statistical properties not in the target unit of measurements is transformed +by using the [`omlib`](https://github.com/dieudonneWillems/OMLib) package. + +For example: a user wants to find datasets that contain *"temperature measurements between 0 - 10 °C"*. Then the +search service transforms the query to the dataset on the right from °F to contain *"temperature measurements +between 32 - 50 °F"* instead. + +<figure markdown> + +<figcaption>Figure 4: Unit independent search query transformation</figcaption> +</figure> \ No newline at end of file diff --git a/.docs/deployment-docker-compose.md b/.docs/deployment-docker-compose.md index 7b6d9922561ddeafee790da540e301b65e09f74c..870ea0c142a07af408176625dae5225f7e4a637d 100644 --- a/.docs/deployment-docker-compose.md +++ b/.docs/deployment-docker-compose.md @@ -9,7 +9,7 @@ author: Martin Weise If you have [Docker](https://docs.docker.com/engine/install/) already installed on your system, you can install DBRepo with: ```shell -curl -sSL https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/raw/release-__APPVERSION__/install.sh | bash +curl -sSL https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/raw/release-1.4.4/install.sh | bash ``` ## Requirements @@ -34,7 +34,7 @@ the following settings. ### Software We only test the Docker Compose deployment with the -official [Docker engine](https://docs.docker.com/engine/install/debian/) installed on +official [Docker Engine](https://docs.docker.com/engine/install/debian/) installed on a [Debian](https://www.debian.org/)-based operating system. Other software deployments (e.g. Docker Desktop on Windows) are *not* recommended and not tested. @@ -53,7 +53,7 @@ technologies. The conceptualized microservices operate the basic database operat ### Notes -Please note that we only save the state of the databases as well as the [Broker Service](./system-services-broker) +Please note that we only save the state of the databases as well as the [Broker Service](../broker-service) since RabbitMQ maintains state inside the container. ## Deployment @@ -147,50 +147,11 @@ Please be warned that the default configuration is not intended for public deplo running system within minutes to play around within the system and explore features. It is strongly advised to change the default `.env` environment variables. -Next, create a [user account](./usage-overview/#create-user-account) and -then [create a database](./usage-overview/#create-database) to [import a dataset](./usage-overview/#import-dataset). - -## Security - -!!! warning "Known security issues with the default configuration" - - The system is auto-configured for a small, local, test deployment and is *not* secure! You need to make modifications - in various places to make it secure: - - * **Authentication Service**: - - a. You need to use your own instance or configure a secure instance using a (self-signed) certificate. - Additionally, when serving from a non-default Authentication Service, you need to put it into the - `JWT_ISSUER` environment variable (`.env`). - - b. You need to change the default admin user `fda` password in Realm - master > Users > fda > Credentials > Reset password. - - c. You need to change the client secrets for the clients `dbrepo-client` and `broker-client`. Do this in Realm - dbrepo > Clients > dbrepo-client > Credentials > Client secret > Regenerate. Do the same for the - broker-client. - - d. You need to regenerate the public key of the `RS256` algorithm which is shared with all services to verify - the signature of JWT tokens. Add your securely generated private key in Realm - dbrepo > Realm settings > Keys > Providers > Add provider > rsa. - - * **Broker Service**: by default, this service is configured with an administrative user that has major privileges. - You need to change the password of the user *fda* in Admin > Update this user > Password. We found this - [simple guide](https://onlinehelp.coveo.com/en/ces/7.0/administrator/changing_the_rabbitmq_administrator_password.htm) - to be very useful. - - * **Search Database**: by default, this service is configured to require authentication with an administrative user - that is allowed to write into the indizes. Following - this [simple guide](https://www.elastic.co/guide/en/elasticsearch/reference/8.7/reset-password.html), this can be - achieved using the command line. - - * **Gateway Service**: by default, no HTTPS is used that protects the services behind. You need to provide a trusted - SSL/TLS certificate in the configuration file or use your own proxy in front of the Gateway Service. See this - [simple guide](http://nginx.org/en/docs/http/configuring_https_servers.html) on how to install a SSL/TLS - certificate on NGINX. +Next, create a [user account](../api/#create-user-account) and +then [create a database](../api/#create-database) to [import a dataset](../api/#import-dataset). ## Limitations !!! info "Alternative Deployments" - Alternatively, you can also deploy DBRepo with [Helm](./deployment-helm/) in your virtual machine instead. + Alternatively, you can also deploy DBRepo with [Kubernetes](../deployment-helm) in your virtual machine instead. diff --git a/.docs/deployment-helm.md b/.docs/deployment-helm.md index 5b0be43553584e6c6be4f582615bc9afcffd918a..86f1257f1dc01c8c3f2fc9bd307afdec24c05a7f 100644 --- a/.docs/deployment-helm.md +++ b/.docs/deployment-helm.md @@ -7,15 +7,15 @@ author: Martin Weise ## TL;DR To install DBRepo in your existing cluster, download the -sample [`values.yaml`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-deployment/-/raw/master/charts/dbrepo-core/values.yaml?inline=false) +sample [`values.yaml`](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/blob/release-1.4.4/helm/dbrepo/values.yaml) for your deployment and update the variables, especially `hostname`. ```shell helm upgrade --install dbrepo \ -n dbrepo \ - "oci://s210.dl.hpc.tuwien.ac.at/dbrepo/helm/dbrepo" \ + "oci://registry.datalab.tuwien.ac.at/dbrepo/helm/dbrepo" \ --values ./values.yaml \ - --version "__CHARTVERSION__" \ + --version "1.4.4" \ --create-namespace \ --cleanup-on-fail ``` @@ -32,12 +32,12 @@ about values, etc. ## Limitations 1. MariaDB Galera does not (yet) support XA-transactions required by the authentication service (=Keycloak). Therefore - only a single MariaDB pod can be deployed at once for the [auth database](./system-databases-authentication). + only a single MariaDB pod can be deployed at once for the Auth database. 2. The entire Helm deployment is rootless (=`runAsNonRoot=true`) except for - the [Storage Service](./system-services-storage/) which still requires a root user. + the [Storage Service](../api/storage-service) which still requires a root user. !!! question "Do you miss functionality? Do these limitations affect you?" We strongly encourage you to help us implement it as we are welcoming contributors to open-source software and get - in [contact](./contact) with us, we happily answer requests for collaboration with attached CV and your programming + in [contact](../../contact) with us, we happily answer requests for collaboration with attached CV and your programming experience! diff --git a/.docs/index.md b/.docs/index.md index a6cfdea09b3a8908b58d1ee0d6af38f5cece99e4..4b869b4d8fac9e7cf09ae7cd0e57dacaf077b411 100644 --- a/.docs/index.md +++ b/.docs/index.md @@ -5,6 +5,7 @@ author: Martin Weise [](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services){ tabindex=-1 } [](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services){ tabindex=-1 } [](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services){ tabindex=-1 } +[](https://hub.docker.com/u/dbrepo){ tabindex=-1 } [](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services){ tabindex=-1 } Documentation for version: [v1.4.4](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/releases). diff --git a/dbrepo-analyse-service/lib/dbrepo-1.4.4-py3-none-any.whl b/dbrepo-analyse-service/lib/dbrepo-1.4.4-py3-none-any.whl index cf6f48a659b0d2ec2caad06f5df06e7f047ab5d5..7e8fd7fca5aa6158bf57952f7f1050a08b331402 100644 Binary files a/dbrepo-analyse-service/lib/dbrepo-1.4.4-py3-none-any.whl and b/dbrepo-analyse-service/lib/dbrepo-1.4.4-py3-none-any.whl differ diff --git a/dbrepo-analyse-service/lib/dbrepo-1.4.4.tar.gz b/dbrepo-analyse-service/lib/dbrepo-1.4.4.tar.gz index aa416987d2965540f8b1d3dd777b72b758fc6940..3e45d4513a31a1fda334ed9ad2c5cbad3803199a 100644 Binary files a/dbrepo-analyse-service/lib/dbrepo-1.4.4.tar.gz and b/dbrepo-analyse-service/lib/dbrepo-1.4.4.tar.gz differ diff --git a/dbrepo-metadata-db/Dockerfile b/dbrepo-metadata-db/Dockerfile deleted file mode 100644 index dab74c702c6cab912ed060e9cc92a3d74b1e66c8..0000000000000000000000000000000000000000 --- a/dbrepo-metadata-db/Dockerfile +++ /dev/null @@ -1,6 +0,0 @@ -FROM bitnami/mariadb:11.2.2-debian-11-r0 as runtime - -ENV MARIADB_DATABASE=fda -ENV MARIADB_ROOT_PASSWORD=dbrepo - -COPY ./setup-schema.sql /docker-entrypoint-initdb.d/setup-schema.sql \ No newline at end of file diff --git a/dbrepo-metadata-db/migrate_1.4.0-1.4.1.sql b/dbrepo-metadata-db/migrate_1.4.0-1.4.1.sql deleted file mode 100644 index a849d52476bae19b896c710432f511efafd4ebf6..0000000000000000000000000000000000000000 --- a/dbrepo-metadata-db/migrate_1.4.0-1.4.1.sql +++ /dev/null @@ -1,19 +0,0 @@ -ALTER TABLE mdb_databases DROP SYSTEM VERSIONING; -ALTER TABLE mdb_databases ADD COLUMN image longblob; -ALTER TABLE mdb_databases ADD SYSTEM VERSIONING; -ALTER TABLE mdb_tables DROP SYSTEM VERSIONING; -ALTER TABLE mdb_tables ADD COLUMN processed_constraints BOOLEAN NOT NULL DEFAULT false; -ALTER TABLE mdb_tables ADD SYSTEM VERSIONING; -ALTER TABLE mdb_columns DROP SYSTEM VERSIONING; -ALTER TABLE mdb_columns DROP COLUMN alias; -ALTER TABLE mdb_columns ADD SYSTEM VERSIONING; -ALTER TABLE mdb_constraints_foreign_key DROP SYSTEM VERSIONING; -ALTER TABLE mdb_constraints_foreign_key ADD COLUMN name VARCHAR(255) NOT NULL; -ALTER TABLE mdb_constraints_foreign_key ADD SYSTEM VERSIONING; -ALTER TABLE mdb_constraints_unique DROP SYSTEM VERSIONING; -ALTER TABLE mdb_constraints_unique ADD COLUMN name VARCHAR(255) NOT NULL; -ALTER TABLE mdb_constraints_unique ADD SYSTEM VERSIONING; -ALTER TABLE mdb_view_columns DROP SYSTEM VERSIONING; -ALTER TABLE mdb_view_columns ADD COLUMN alias VARCHAR(100); -ALTER TABLE mdb_view_columns CHANGE COLUMN position ordinal_position INTEGER; -ALTER TABLE mdb_view_columns ADD SYSTEM VERSIONING; \ No newline at end of file diff --git a/dbrepo-search-service/lib/dbrepo-1.4.4-py3-none-any.whl b/dbrepo-search-service/lib/dbrepo-1.4.4-py3-none-any.whl index cf6f48a659b0d2ec2caad06f5df06e7f047ab5d5..7e8fd7fca5aa6158bf57952f7f1050a08b331402 100644 Binary files a/dbrepo-search-service/lib/dbrepo-1.4.4-py3-none-any.whl and b/dbrepo-search-service/lib/dbrepo-1.4.4-py3-none-any.whl differ diff --git a/dbrepo-search-service/lib/dbrepo-1.4.4.tar.gz b/dbrepo-search-service/lib/dbrepo-1.4.4.tar.gz index aa416987d2965540f8b1d3dd777b72b758fc6940..3e45d4513a31a1fda334ed9ad2c5cbad3803199a 100644 Binary files a/dbrepo-search-service/lib/dbrepo-1.4.4.tar.gz and b/dbrepo-search-service/lib/dbrepo-1.4.4.tar.gz differ diff --git a/docker-compose.yml b/docker-compose.yml index 48d62373b213f20fcaf4443ebbb12437d4bf2f02..65d23f7e45b36a82df4cad73cccf6424b9413e61 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -14,13 +14,11 @@ services: restart: "no" container_name: dbrepo-metadata-db hostname: metadata-db - image: dbrepo-metadata-db:latest - build: - context: ./dbrepo-metadata-db - network: host + image: docker.io/bitnami/mariadb:11.1.3-debian-11-r6 volumes: - metadata-db-data:/bitnami/mariadb - - ./dbrepo-metadata-db/setup-data.sql:/docker-entrypoint-initdb.d/setup-schema_local.sql + - ./dbrepo-metadata-db/setup-schema.sql:/docker-entrypoint-initdb.d/1_setup-schema.sql + - ./dbrepo-metadata-db/setup-data.sql:/docker-entrypoint-initdb.d/2_setup-data.sql ports: - "3306:3306" environment: @@ -38,7 +36,7 @@ services: restart: "no" container_name: dbrepo-data-db hostname: data-db - image: docker.io/bitnami/mariadb-galera:11.2.2-debian-11-r0 + image: docker.io/bitnami/mariadb:11.1.3-debian-11-r6 volumes: - data-db-data:/bitnami/mariadb - "${SHARED_VOLUME:-/tmp}:/tmp" @@ -46,7 +44,6 @@ services: - "3307:3306" environment: MARIADB_ROOT_PASSWORD: "${USER_DB_PASSWORD:-dbrepo}" - MARIADB_GALERA_MARIABACKUP_PASSWORD: "${USER_DB_BACKUP_PASSWORD:-dbrepo}" healthcheck: test: mysqladmin ping --user="${USER_DB_USERNAME:-root}" --password="${USER_DB_PASSWORD:-dbrepo}" --silent interval: 10s @@ -59,7 +56,7 @@ services: restart: "no" container_name: dbrepo-auth-db hostname: auth-db - image: docker.io/bitnami/mariadb:11.2.2-debian-11-r0 + image: docker.io/bitnami/mariadb:11.1.3-debian-11-r6 volumes: - auth-db-data:/bitnami/mariadb ports: @@ -331,7 +328,7 @@ services: restart: "no" container_name: dbrepo-gateway-service hostname: gateway-service - image: docker.io/nginx:1.25-alpine-slim + image: docker.io/nginx:1.27.0-alpine3.19-slim ports: - "80:80" - "443:443" diff --git a/helm/dbrepo/Chart.lock b/helm/dbrepo/Chart.lock index e7fbf0ea099942650d029a991cf3ca490ef1f11b..dd42ade0c3ffaa28c6d562fc46f30dafc81a7ecf 100644 --- a/helm/dbrepo/Chart.lock +++ b/helm/dbrepo/Chart.lock @@ -1,16 +1,16 @@ dependencies: - name: opensearch - repository: https://opensearch-project.github.io/helm-charts/ - version: 2.15.0 + repository: https://charts.bitnami.com/bitnami + version: 1.2.2 - name: keycloak repository: https://charts.bitnami.com/bitnami version: 17.3.3 -- name: mariadb-galera +- name: mariadb repository: https://charts.bitnami.com/bitnami - version: 11.0.1 -- name: mariadb-galera + version: 14.1.4 +- name: mariadb repository: https://charts.bitnami.com/bitnami - version: 11.0.1 + version: 14.1.4 - name: rabbitmq repository: https://charts.bitnami.com/bitnami version: 14.0.0 @@ -20,5 +20,5 @@ dependencies: - name: tusd repository: https://charts.sagikazarmark.dev version: 0.1.2 -digest: sha256:f724e33944ae5284b9417a3424a4af9cd67eb8bea0baa0ebeddc76f4c0c9c63a -generated: "2024-05-17T21:25:35.919266246+02:00" +digest: sha256:867a4a60bbccfaeb880d000eeb634db20554ef91523aa3b1331c53bdf48e8db4 +generated: "2024-06-14T15:12:25.44560113+02:00" diff --git a/helm/dbrepo/Chart.yaml b/helm/dbrepo/Chart.yaml index 24e580a29731861c53e63192aa5346531260c135..4838a04ed0bbb9327b899097c22e01960760fee2 100644 --- a/helm/dbrepo/Chart.yaml +++ b/helm/dbrepo/Chart.yaml @@ -10,28 +10,28 @@ keywords: - dbrepo maintainers: - name: Martin Weise - email: martin.weise@tuwien.ac.at + email: martin.weise@tuwien.ac.a home: https://www.ifs.tuwien.ac.at/infrastructures/dbrepo/ icon: https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/raw/master/dbrepo-ui/public/favicon.png dependencies: - name: opensearch alias: searchdb - version: 2.15.0 - repository: https://opensearch-project.github.io/helm-charts/ + version: 1.2.2 + repository: https://charts.bitnami.com/bitnami condition: searchdb.enabled - name: keycloak alias: authservice version: 17.3.3 repository: https://charts.bitnami.com/bitnami condition: authservice.enabled - - name: mariadb-galera + - name: mariadb alias: datadb - version: 11.0.1 + version: 14.1.4 repository: https://charts.bitnami.com/bitnami condition: datadb.enabled - - name: mariadb-galera + - name: mariadb alias: metadatadb - version: 11.0.1 + version: 14.1.4 repository: https://charts.bitnami.com/bitnami condition: metadatadb.enabled - name: rabbitmq diff --git a/helm/dbrepo/charts/mariadb-14.1.4.tgz b/helm/dbrepo/charts/mariadb-14.1.4.tgz new file mode 100644 index 0000000000000000000000000000000000000000..83f470bdcade4fdfc13b0d1f4f46095b877e3bcd Binary files /dev/null and b/helm/dbrepo/charts/mariadb-14.1.4.tgz differ diff --git a/helm/dbrepo/charts/mariadb-galera-11.0.1.tgz b/helm/dbrepo/charts/mariadb-galera-11.0.1.tgz deleted file mode 100644 index 75966763de12ffca164d475cccac327a338857df..0000000000000000000000000000000000000000 Binary files a/helm/dbrepo/charts/mariadb-galera-11.0.1.tgz and /dev/null differ diff --git a/helm/dbrepo/charts/opensearch-1.2.2.tgz b/helm/dbrepo/charts/opensearch-1.2.2.tgz new file mode 100644 index 0000000000000000000000000000000000000000..0393bfc1aa2fa964c68e66af6da6f356ea84e29f Binary files /dev/null and b/helm/dbrepo/charts/opensearch-1.2.2.tgz differ diff --git a/helm/dbrepo/charts/opensearch-2.15.0.tgz b/helm/dbrepo/charts/opensearch-2.15.0.tgz deleted file mode 100644 index 7d2f6efb43a2d44e8dfffde4e0265d302af2b2a6..0000000000000000000000000000000000000000 Binary files a/helm/dbrepo/charts/opensearch-2.15.0.tgz and /dev/null differ diff --git a/helm/dbrepo/templates/metadata-configmap.yaml b/helm/dbrepo/templates/metadata-configmap.yaml index 4bb2eb136b557c0a44f3a9fb77d8d6a023ab67ea..7965f0a3855c991291a30dcef43294226971aeeb 100644 --- a/helm/dbrepo/templates/metadata-configmap.yaml +++ b/helm/dbrepo/templates/metadata-configmap.yaml @@ -12,7 +12,7 @@ data: 02-setup-data.sql: | BEGIN; INSERT INTO `mdb_containers` (name, internal_name, image_id, host, port, sidecar_host, sidecar_port, privileged_username, privileged_password) - VALUES ('MariaDB Galera 11.1.3', 'mariadb_11_1_3', 1, 'data-db', 3306, 'data-db', 80, 'root', 'dbrepo'); + VALUES ('MariaDB 11.1.3', 'mariadb_11_1_3', 1, 'data-db', 3306, 'data-db', 80, 'root', 'dbrepo'); COMMIT; 01-setup-schema.sql: | BEGIN; diff --git a/helm/dbrepo/templates/metadata-secret.yaml b/helm/dbrepo/templates/metadata-secret.yaml index 3beda17fc57fe12ecb06021e3b81fe47e6a067f0..fe48d381ea4495ff1bd4b0c927190ef53275919c 100644 --- a/helm/dbrepo/templates/metadata-secret.yaml +++ b/helm/dbrepo/templates/metadata-secret.yaml @@ -15,7 +15,7 @@ stringData: AUTH_SERVICE_CLIENT: "{{ .Values.authservice.client.id }}" AUTH_SERVICE_CLIENT_SECRET: "{{ .Values.authservice.client.secret }}" AUTH_SERVICE_ENDPOINT: "{{ .Values.authservice.endpoint }}" - BASE_URL: "{{ .Values.hostname }}" + BASE_URL: "{{ .Values.gateway }}" BROKER_EXCHANGE_NAME: "{{ .Values.brokerservice.exchangeName }}" BROKER_HOST: "{{ .Values.brokerservice.host }}" BROKER_QUEUE_NAME: "{{ .Values.brokerservice.queueName }}" @@ -33,11 +33,11 @@ stringData: GRANULARITY: "{{ .Values.metadataservice.granularity }}" JWT_PUBKEY: "{{ .Values.authservice.jwt.pubkey }}" LOG_LEVEL: "{{ ternary "trace" "info" .Values.metadataservice.image.debug }}" - METADATA_DB: "{{ .Values.metadatadb.db.name }}" + METADATA_DB: "{{ .Values.metadatadb.auth.database }}" METADATA_HOST: "{{ .Values.metadatadb.host }}" METADATA_JDBC_EXTRA_ARGS: "{{ .Values.metadatadb.jdbcExtraArgs }}" - METADATA_USERNAME: "{{ .Values.metadatadb.rootUser.user }}" - METADATA_PASSWORD: "{{ .Values.metadatadb.rootUser.password }}" + METADATA_USERNAME: "{{ .Values.metadatadb.auth.root }}" + METADATA_PASSWORD: "{{ .Values.metadatadb.auth.rootPassword }}" PID_BASE: "{{ $pidBase }}" REPOSITORY_NAME: "{{ .Values.metadataservice.repositoryName }}" SEARCH_SERVICE_ENDPOINT: "{{ .Values.searchservice.endpoint }}" diff --git a/helm/dbrepo/values.yaml b/helm/dbrepo/values.yaml index c9d8a260bf7b1c611e462c25f44f47ef7a8bc6ff..eba2674db01a2f239b8df0393494b584af4b1aea 100644 --- a/helm/dbrepo/values.yaml +++ b/helm/dbrepo/values.yaml @@ -53,18 +53,15 @@ metadatadb: image: debug: false host: metadata-db - rootUser: - user: root - password: dbrepo + auth: + root: root + rootPassword: dbrepo + database: dbrepo + replicationUser: replication + replicationPassword: replication jdbcExtraArgs: "" - db: - name: fda metrics: enabled: false - galera: - mariabackup: - user: mariabackup - password: mariabackup initdbScriptsConfigMap: metadata-db-setup extraInitDbScripts: { } # 03-additional-data.sql: | @@ -72,14 +69,8 @@ metadatadb: # INSERT INTO `mdb_containers` (name, internal_name, image_id, host, port, sidecar_host, sidecar_port, privileged_username, privileged_password) # VALUES ('MariaDB Galera TEST', 'mariadb_11_1_3', 1, 'data-db', 3306, 'data-db', 80, 'root', 'dbrepo'); # COMMIT; - service: - type: ClusterIP - annotations: { } - loadBalancerIP: "" - loadBalancerSourceRanges: [ ] - persistence: - enabled: false - replicaCount: 3 + secondary: + replicaCount: 2 ## @section Auth Service @@ -161,70 +152,69 @@ datadb: image: debug: false extraFlags: "--character-set-server=utf8mb4 --collation-server=utf8mb4_general_ci" - rootUser: - user: root - password: dbrepo + auth: + rootPassword: dbrepo + replicationUser: replication + replicationPassword: replication metrics: enabled: true - galera: - mariabackup: - user: mariabackup - password: mariabackup - service: - extraPorts: - - name: "sidecar" - port: 8080 - targetPort: 8080 - protocol: TCP - sidecars: - - name: sidecar - image: registry.datalab.tuwien.ac.at/dbrepo/data-db-sidecar:1.4.4 - imagePullPolicy: Always - securityContext: - runAsUser: 1001 - runAsGroup: 0 - runAsNonRoot: true - allowPrivilegeEscalation: false - seccompProfile: - type: RuntimeDefault - capabilities: - drop: - - ALL - ports: + primary: + service: + extraPorts: - name: "sidecar" - containerPort: 8080 + port: 8080 + targetPort: 8080 protocol: TCP - envFrom: - - secretRef: - name: data-service-secret - livenessProbe: - exec: - command: - - /bin/bash - - -ec - - "curl -sSL localhost:8080/health | grep 'UP' || exit 1" - initialDelaySeconds: 120 - periodSeconds: 30 - readinessProbe: - exec: - command: - - /bin/bash - - -ec - - "curl -sSL localhost:8080/health | grep 'UP' || exit 1" - initialDelaySeconds: 30 - periodSeconds: 30 - volumeMounts: - - name: s3 - mountPath: /s3 - extraVolumeMounts: - - name: s3 - mountPath: /s3 - extraVolumes: - - name: s3 - emptyDir: { } - persistence: - enabled: false - replicaCount: 3 + sidecars: + - name: sidecar + image: registry.datalab.tuwien.ac.at/dbrepo/data-db-sidecar:1.4.4 + imagePullPolicy: Always + securityContext: + runAsUser: 1001 + runAsGroup: 0 + runAsNonRoot: true + allowPrivilegeEscalation: false + seccompProfile: + type: RuntimeDefault + capabilities: + drop: + - ALL + ports: + - name: "sidecar" + containerPort: 8080 + protocol: TCP + envFrom: + - secretRef: + name: data-service-secret + livenessProbe: + exec: + command: + - /bin/bash + - -ec + - "curl -sSL localhost:8080/health | grep 'UP' || exit 1" + initialDelaySeconds: 120 + periodSeconds: 30 + readinessProbe: + exec: + command: + - /bin/bash + - -ec + - "curl -sSL localhost:8080/health | grep 'UP' || exit 1" + initialDelaySeconds: 30 + periodSeconds: 30 + volumeMounts: + - name: s3 + mountPath: /s3 + extraVolumeMounts: + - name: s3 + mountPath: /s3 + extraVolumes: + - name: s3 + emptyDir: { } + persistence: + enabled: false + secondary: + replicaCount: 2 ## @section Search Database @@ -232,7 +222,6 @@ datadb: ## @skip searchdb.fullnameOverride ## @param searchdb.host The hostname for the microservices. ## @param searchdb.port The port for the microservices. -## @skip searchdb.protocol ## @param searchdb.username The admin username. ## @param searchdb.password The admin user password. ## @skip searchdb.clusterName @@ -249,77 +238,13 @@ datadb: searchdb: enabled: true fullnameOverride: search-db + servicenameOverride: search-db host: search-db port: 9200 - protocol: http - username: admin - password: admin - clusterName: search-db - masterService: search-db - replicas: 3 - sysctlInit: - enabled: true - persistence: + security: enabled: false - service: - type: ClusterIP - annotations: { } - loadBalancerSourceRanges: [ ] - extraEnvs: - - name: DISABLE_INSTALL_DEMO_CONFIG - value: "true" - extraVolumeMounts: - - name: node-cert - mountPath: /usr/share/opensearch/config/tls - readOnly: true - extraVolumes: - - name: node-cert - secret: - secretName: search-db-secret - config: - opensearch.yml: | - cluster.name: search-db - network.host: 0.0.0.0 - plugins: - security: - ssl: - transport: - pemcert_filepath: tls/tls.crt - pemkey_filepath: tls/tls.key - pemtrustedcas_filepath: tls/ca.crt - enforce_hostname_verification: false - http: - #enabled: true # uncomment to force ssl connections - pemcert_filepath: tls/tls.crt - pemkey_filepath: tls/tls.key - pemtrustedcas_filepath: tls/ca.crt - allow_unsafe_democertificates: false - allow_default_init_securityindex: true - authcz: - admin_dn: - - CN=search-db - nodes_dn: - - CN=search-db - audit.type: internal_opensearch - enable_snapshot_restore_privilege: true - check_snapshot_restore_write_privileges: true - restapi: - roles_enabled: [ "all_access", "security_rest_api_access" ] - system_indices: - enabled: true - indices: - [ - ".opendistro-alerting-config", - ".opendistro-alerting-alert*", - ".opendistro-anomaly-results*", - ".opendistro-anomaly-detector*", - ".opendistro-anomaly-checkpoints", - ".opendistro-anomaly-detection-state", - ".opendistro-reports-*", - ".opendistro-notifications-*", - ".opendistro-notebooks", - ".opendistro-asynchronous-search-response*", - ] + adminPassword: admin + clusterName: search-db ## @section Upload Service @@ -435,7 +360,7 @@ brokerservice: type: ClusterIP managerPortEnabled: true # loadBalancerIP: - replicaCount: 2 + replicaCount: 1 ## @section Analyse Service diff --git a/lib/python/dbrepo/RestClient.py b/lib/python/dbrepo/RestClient.py index e256934d8a91d9659f51c7d7225c49f79cc1fb10..ae956dd07277830235c6d3ca991190766f8fd7de 100644 --- a/lib/python/dbrepo/RestClient.py +++ b/lib/python/dbrepo/RestClient.py @@ -1567,7 +1567,8 @@ class RestClient: url = f'/api/database/{database_id}/subset' if page is not None and size is not None: url += f'?page={page}&size={size}' - response = self._wrapper(method="post", url=url, force_auth=True, payload=ExecuteQuery(statement=query)) + response = self._wrapper(method="post", url=url, force_auth=True, headers={"Accept": "application/json"}, + payload=ExecuteQuery(statement=query)) if response.status_code == 201: body = response.json() return Result.model_validate(body)