Skip to content
Snippets Groups Projects
Commit 68f3206e authored by Martin Weise's avatar Martin Weise
Browse files

Dev

parent 81f592c5
No related branches found
No related tags found
2 merge requests!259Master,!258Dev
Showing
with 870 additions and 1860 deletions
......@@ -8,7 +8,7 @@ author: Martin Weise
!!! debug "Debug Information"
Image: [`bitnami/rabbitmq:3.10`](https://hub.docker.com/r/bitnami/rabbitmq)
Image: [`bitnami/rabbitmq:3.12.13-debian-12-r2`](https://hub.docker.com/r/bitnami/rabbitmq)
* Ports: 5672/tcp, 15672/tcp, 15692/tcp
* AMQP: `amqp://<hostname>:5672`
......
---
author: Martin Weise
---
# Analyse Service
Given a [CSV-file](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-datasets/-/raw/master/gps.csv)
containing GPS-data `gps.csv` already uploaded in the `dbrepo-upload` bucket of the Storage Service with key `gps.csv`:
=== "Terminal"
```shell
curl -X POST \
-d '{"filename":"gps.csv","separator":","}'
http://<hostname>/api/analyse/determinedt
```
This results in the response:
```json
{
"columns": {
"ID": "bigint",
"KEY": "varchar",
"OBJECTID": "bigint",
"LBEZEICHNUNG": "varchar",
"LTYP": "bigint",
"LTYPTXT": "varchar",
"LAT": "decimal",
"LNG": "decimal"
},
"separator": ","
}
```
\ No newline at end of file
---
author: Martin Weise
---
# Broker Service
## Preliminary
The RabbitMQ client can be authenticated through Basic Authentication (username, password) and Bearer Authentication.
!!! example "Bearer Authentication"
Note that the encoded/signed `ACCESS_TOKEN` already contains a field `client_id=username`, so the username is
optional in `PlainCredentials` when using Bearer Authentication, but provided must match the username.
=== "Bearer Authentication"
```python
import pika
# Configure client
credentials = pika.credentials.PlainCredentials("", "ACCESS_TOKEN")
parameters = pika.ConnectionParameters('localhost', 5672, '/', credentials)
connection = pika.BlockingConnection(parameters)
# Channel
channel = connection.channel()
channel.basic_publish(exchange='dbrepo',
routing_key='dbrepo.database_name.table_name',
body=b'Hello World!')
print(" [x] Sent 'Hello World!'")
connection.close()
```
=== "Basic Authentication"
```python
import pika
# Configure client
credentials = pika.credentials.PlainCredentials("username", "password")
parameters = pika.ConnectionParameters('localhost', 5672, '/', credentials)
connection = pika.BlockingConnection(parameters)
# Channel
channel = connection.channel()
channel.basic_publish(exchange='dbrepo',
routing_key='dbrepo.database_name.table_name',
body=b'Hello World!')
print(" [x] Sent 'Hello World!'")
connection.close()
```
---
author: Martin Weise
---
# Metadata Service
## Preliminary
<!-- md:version 1.4.1 -->
!!! example "Basic Authentication"
The use of **Basic Authentication** (username, password) instead of *Bearer Authentication* may be useful for
applications that do not have the technical capability of refreshing tokens in intervals (e.g. single-threaded
applications). It is however not recommended for any other applications as **Basic Authentication** transmits the
user password with every request.
Additionally, performance is decreased as with every **Basic Authentication** request, an additional request is
sent to the [Authentication Service](../system-services-authentication/) where the authorization is requested before
authentication to the Metadata Service. This performance degradation should be avoided whenever possible. Use
**Bearer Authentication** instead, see how to
[obtain an access token](../usage-authentication/#obtain-access-token).
This diff is collapsed.
---
author: Martin Weise
---
# Python Library
## tl;dr
!!! debug "Debug Information"
PyPI: [`dbrepo`](https://pypi.org/project/dbrepo/)
* Full module documentation <a href="../sphinx" target="_blank">:fontawesome-solid-square-up-right: view online</a>
## Installing
:octicons-tag-16:{ title="Minimum version" } 1.4.2
```console
$ python -m pip install dbrepo
```
To use DBRepo in your Jupyter notebook, install the `dbrepo` library` directly in a code cell and type:
```jupyter
!pip install dbrepo
```
This package supports Python 3.11+.
## Quickstart
Create a table and import a .csv file from your computer.
```python
from dbrepo.RestClient import RestClient
from dbrepo.api.dto import CreateTableColumn, ColumnType, CreateTableConstraints
client = RestClient(endpoint='https://test.dbrepo.tuwien.ac.at', username="foo",
password="bar")
# analyse csv
analysis = client.analyse_datatypes(file_path="sensor.csv", separator=",")
print(f"Analysis result: {analysis}")
# -> columns=(date=date, precipitation=decimal, lat=decimal, lng=decimal), separator=,
# line_termination=\n
# create table
table = client.create_table(database_id=1,
name="Sensor Data",
constraints=CreateTableConstraints(checks=['precipitation >= 0'],
uniques=[['precipitation']]),
columns=[CreateTableColumn(name="date",
type=ColumnType.DATE,
dfid=3, # YYYY-MM-dd
primary_key=True,
null_allowed=False),
CreateTableColumn(name="precipitation",
type=ColumnType.DECIMAL,
size=10,
d=4,
primary_key=False,
null_allowed=True),
CreateTableColumn(name="lat",
type=ColumnType.DECIMAL,
size=10,
d=4,
primary_key=False,
null_allowed=True),
CreateTableColumn(name="lng",
type=ColumnType.DECIMAL,
size=10,
d=4,
primary_key=False,
null_allowed=True)])
print(f"Create table result {table}")
# -> (id=1, internal_name=sensor_data, ...)
client.import_table_data(database_id=1, table_id=1, file_path="sensor.csv", separator=",",
skip_lines=1, line_encoding="\n")
print(f"Finished.")
```
The library is well-documented, please see the [full documentation](../sphinx) or
the [PyPI page](https://pypi.org/project/dbrepo/).
## Supported Features & Best-Practices
- Manage user account ([docs](../usage-overview/#create-user-account))
- Manage databases ([docs](../usage-overview/#create-database))
- Manage database access & visibility ([docs](../usage-overview/#private-database-access))
- Import dataset ([docs](../usage-overview/#private-database-access))
- Create persistent identifiers ([docs](../usage-overview/#assign-database-pid))
- Execute queries ([docs](../usage-overview/#export-subset))
- Get data from tables/views/subsets
## Secrets
It is not recommended to store credentials directly in the notebook as they will be versioned with git, etc. Use
environment variables instead:
```properties title=".env"
DBREPO_ENDPOINT=https://test.dbrepo.tuwien.ac.at
DBREPO_USERNAME=foo
DBREPO_PASSWORD=bar
DBREPO_SECURE=True
```
Then use the default constructor of the `RestClient` to e.g. analyse a CSV. Your secrets are automatically passed:
```python title="analysis.py"
from dbrepo.RestClient import RestClient
client = RestClient()
analysis = client.analyse_datatypes(file_path="sensor.csv", separator=",")
```
## Future
- Searching
## Links
This information is also mirrored on [PyPI](https://pypi.org/project/dbrepo/).
\ No newline at end of file
---
author: Martin Weise
---
# Search Service
The Search Service connects to the [Search Database](../system-databases-search/).
!!! note "This section will be expanded"
\ No newline at end of file
......@@ -18,12 +18,22 @@ cache:
- .m2/
stages:
- lint
- build
- test
- docs
- release
- scan
lint-yaml:
image: bash:5.2-alpine3.19
stage: build
script:
- "apk add yq"
- "yq '.services.[] | .environment' docker-compose.yml > ./doc.txt"
- "yq '.services.[] | .environment' docker-compose.prod.yml > ./other.txt"
- "cmp --silent ./doc.txt ./other.txt"
build-metadata-service:
image: maven:3-openjdk-17
stage: build
......@@ -45,6 +55,18 @@ build-analyse-service:
- "pip install pipenv"
- "pipenv install gunicorn && pipenv install --dev --system --deploy"
build-lib:
image: python:3.11-slim
stage: build
except:
refs:
- /^release-.*/
variables:
PIPENV_PIPFILE: "./lib/python/Pipfile"
script:
- "pip install pipenv"
- "pipenv install gunicorn && pipenv install --dev --system --deploy"
build-data-service:
image: maven:3-openjdk-17
stage: build
......@@ -174,6 +196,31 @@ test-analyse-service:
junit: ./dbrepo-analyse-service/report.xml
coverage: '/TOTAL.*?([0-9]{1,3})%/'
test-lib:
image: python:3.11-slim
stage: test
except:
refs:
- /^release-.*/
variables:
PIPENV_PIPFILE: "./lib/python/Pipfile"
needs:
- build-lib
script:
- "pip install pipenv"
- "pipenv install gunicorn && pipenv install --dev --system --deploy"
- cd ./lib/python/ && coverage run -m pytest tests/test_database.py --junitxml=report.xml && coverage html --omit="test/*" && coverage report --omit="test/*" > ./coverage.txt
- "cat ./coverage.txt | grep -o 'TOTAL[^%]*%'"
artifacts:
when: always
paths:
- ./lib/python/report.xml
- ./lib/python/coverage.txt
expire_in: 1 days
reports:
junit: ./lib/python/report.xml
coverage: '/TOTAL.*?([0-9]{1,3})%/'
scan-analyse-service:
image: bitnami/trivy:latest
stage: scan
......@@ -514,3 +561,22 @@ release-docs:
- tar czfv ./final.tar.gz ./final
- "scp -oHostKeyAlgorithms=+ssh-rsa -oPubkeyAcceptedAlgorithms=+ssh-rsa final.tar.gz $CI_DOC_USER@$CI_DOC_IP:final.tar.gz"
- "ssh -oHostKeyAlgorithms=+ssh-rsa -oPubkeyAcceptedAlgorithms=+ssh-rsa $CI_DOC_USER@$CI_DOC_IP 'rm -rf /system/user/ifs/infrastructures/public_html/dbrepo/*; tar xzfv ./final.tar.gz; rm -f ./final.tar.gz; cp -r ./final/* /system/user/ifs/infrastructures/public_html/dbrepo; rm -rf ./final'"
release-libs:
stage: lint
image: docker.io/python:3.11-alpine
only:
refs:
- /^release-[0-9]+.*/
variables:
PIPENV_PIPFILE: "./dbrepo-analyse-service/Pipfile"
script:
- apk add sed
- pip install pipenv
- pipenv install gunicorn && pipenv install --dev --system --deploy
- pip install twine build
- 'sed -i -e "s/__APPVERSION__/${APP_VERSION}rc18/g" ./lib/python/pyproject.toml ./lib/python/setup.py ./lib/python/README.md'
- python -m build --sdist ./lib/python
- python -m build --wheel ./lib/python
- printf "[diskutils]\nindex-servers =\n pypi\n\n[pypi]\nusername = __token__\npassword = ${CI_PIPY_TOKEN}\nrepository = https://upload.pypi.org/legacy/" > .pypirc
- python -m twine upload --config-file .pypirc --verbose --repository pypi ./lib/python/dist/dbrepo-*
\ No newline at end of file
......@@ -26,6 +26,9 @@ build-metadata-service:
build-analyse-service:
bash ./dbrepo-analyse-service/build.sh
build-lib-python:
bash ./lib/python/build.sh
build-docker:
bash ./bin/build-docker.sh
......@@ -131,7 +134,7 @@ release-storage-service-init: tag-storage-service-init
docker push "${REPOSITORY_1_URL}/storage-service-init:${TAG}"
docker push "${REPOSITORY_2_URL}/storage-service-init:${TAG}"
test-backend: test-metadata-service test-analyse-service test-data-service
test-backend: test-metadata-service test-analyse-service test-data-service test-lib-python
test-data-service: build-data-service
mvn -f ./dbrepo-data-service/pom.xml clean test verify
......@@ -142,6 +145,9 @@ test-metadata-service: build-metadata-service
test-analyse-service: build-analyse-service
bash ./dbrepo-analyse-service/test.sh
test-lib-python: build-lib-python
bash ./lib/python/test.sh
scan: scan-analyse-service scan-authentication-service scan-broker-service scan-gateway-service scan-metadata-db scan-metadata-service scan-search-db scan-ui scan-data-service scan-data-db scan-search-dashboard scan-search-service
scan-analyse-service:
......@@ -227,3 +233,4 @@ build-api:
docs:
bash .docs/build-website.sh
bash ./lib/python/build-website.sh
\ No newline at end of file
This diff is collapsed.
......@@ -6,7 +6,6 @@ __pycache__
.DS_Store
# Environment
.env
.flaskenv
*.pyc
*.pyo
......
......@@ -20,6 +20,8 @@ minio = "*"
flask-sqlalchemy = "*"
opensearch-py = "*"
pymysql = "*"
dataclasses = "*"
dataclasses-json = "*"
[dev-packages]
coverage = "*"
......
This diff is collapsed.
import dataclasses
import json
import logging
from _csv import Error
......@@ -13,6 +15,8 @@ from gevent.pywsgi import WSGIServer
from opensearchpy import OpenSearch
from prometheus_flask_exporter import PrometheusMetrics
from botocore.exceptions import ClientError
from determine_dt import determine_datatypes
from determine_pk import determine_pk
from determine_stats import determine_stats
......@@ -61,7 +65,6 @@ opensearch_client = OpenSearch(
use_ssl=False,
)
swagger_config = {
"headers": [],
"specs": [
......@@ -114,98 +117,82 @@ def health():
return Response(res, mimetype="application/json"), 200
@app.route("/api/analyse/determinedt", methods=["POST"], endpoint="analyze_determinedt")
@swag_from("as-yml/determinedt.yml")
def determinedt():
logging.debug("endpoint determine datatype, body=%s", request)
input_json = request.get_json()
@app.route("/api/analyse/datatypes", methods=["GET"], endpoint="analyze_analyse_datatypes")
@swag_from("as-yml/analyse_datatypes.yml")
def analyse_datatypes():
filename: str = request.args.get('filename')
separator: str = request.args.get('separator')
enum: bool = request.args.get('enum', False)
enum_tol: float = request.args.get('enum_tol')
if filename is None or separator is None:
return Response(
json.dumps({'success': False, 'message': "Missing required query parameters 'filename' and 'separator'"}),
mimetype="application/json"), 400
try:
filename = str(input_json["filename"])
enum = False
if "enum" in input_json:
enum = bool(input_json["enum"])
logging.info("Enum is present in payload and set to %s", enum)
enum_tol = 0.001
if "enum_tol" in input_json:
enum_tol = float(input_json["enum_tol"])
logging.info(
"Enum toleration is present in payload and set to %s", enum_tol
)
separator = None
if "separator" in input_json:
separator = str(input_json["separator"])
logging.info("Seperator is present in payload and set to %s", separator)
res = determine_datatypes(filename, enum, enum_tol, separator)
logging.debug("determine datatype resulted in datatypes %s", res)
return Response(res, mimetype="application/json"), 200
return Response(res, mimetype="application/json"), 202
except OSError as e:
logging.error("Failed to determine data types: %s", e)
logging.error(f"Failed to determine data types: {e}")
res = dumps({"success": False, "message": str(e)})
return Response(res, mimetype="application/json"), 409
except Error as e:
logging.error("Failed to determine separator %s", e)
return Response(res, mimetype="application/json"), 400
except ClientError as e:
logging.error(f"Failed to determine separator: {e}")
res = dumps({"success": False, "message": str(e)})
return Response(res, mimetype="application/json"), 422
return Response(res, mimetype="application/json"), 404
except Exception as e:
logging.error("Failed to determine data types: %s", e)
logging.error(f"Failed to determine data types: {e}")
res = dumps({"success": False, "message": str(e)})
return Response(res, mimetype="application/json"), 500
@app.route("/api/analyse/determinepk", methods=["POST"], endpoint="analyze_determinepk")
@swag_from("as-yml/determinepk.yml")
def determinepk():
logging.debug("endpoint determine primary key, body=%s", request)
input_json = request.get_json()
@app.route("/api/analyse/keys", methods=["GET"], endpoint="analyze_analyse_keys")
@swag_from("as-yml/analyse_keys.yml")
def analyse_keys():
filename: str = request.args.get("filename")
separator: str = request.args.get('separator')
if filename is None or separator is None:
return Response(
json.dumps({'success': False, 'message': "Missing required query parameters 'filename' and 'separator'"}),
400)
try:
filepath = str(input_json["filepath"])
seperator = ","
if "seperator" in input_json:
seperator = str(input_json["seperator"])
res = determine_pk(filepath, seperator)
logging.debug("determined list of primary keys: %s", res)
return Response(res, mimetype="application/json"), 200
res = {
'keys': determine_pk(filename, separator)
}
logging.info(f"Determined list of primary keys: {res}")
return Response(dumps(res), mimetype="application/json"), 202
except OSError as e:
logging.error(f"Failed to determine primary key: {e}")
res = dumps({"success": False, "message": str(e)})
return Response(res, mimetype="application/json"), 404
except Exception as e:
logging.error("Failed to determine primary key: %s", e)
logging.error(f"Failed to determine primary key: {e}")
res = dumps({"success": False, "message": str(e)})
return Response(res, mimetype="application/json"), 500
@app.route("/api/analyse/determinestats", methods=["POST"], endpoint="analyse_determinestats")
@swag_from("as-yml/determine_stats.yml")
def determinestats():
logging.debug(
"endpoint to determine the statistical properties, body = %s", request
)
input_json = request.get_json()
if "filepath" not in input_json:
return {"message": "Missing 'filepath'", "status": 400}, 400
filepath = str(input_json["filepath"])
separator = str(input_json.get("separator", ","))
return determine_stats(filepath, separator)
@app.route("/api/analyse/determinestat", methods=["POST"], endpoint="analyse_determinestat")
@swag_from("as-yml/determine_stat.yml")
def determinestat():
input_json = request.get_json()
if "database_id" not in input_json:
return {"message": "Missing 'database_id'", "status": 400}, 400
if "table_id" not in input_json:
return {"message": "Missing 'table_id'", "status": 400}, 400
res = determine_stats(
db,
opensearch_client,
database_id=input_json["database_id"],
table_id=input_json["table_id"],
)
if res:
return {"message": "Analysed statistical properties.", "status": 200}
else:
return {"message": "Database or table does not exist.", "status": 400}, 400
@app.route("/api/analyse/database/<database_id>/table/<table_id>/statistics", methods=["GET"],
endpoint="analyse_analyse_table_stat")
@swag_from("as-yml/analyse_table_stat.yml")
def analyse_table_stat(database_id: int = None, table_id: int = None):
if database_id is None:
return Response(dumps({"message": "Missing path variable 'database_id'", "status": 400}),
mimetype="application/json"), 400
if table_id is None:
return Response(dumps({"message": "Missing path variable 'table_id'", "status": 400}),
mimetype="application/json"), 400
try:
res = determine_stats(db, opensearch_client, database_id=database_id, table_id=table_id)
logging.info(f"Analysed table statistics: {res}")
return Response(json.dumps(dataclasses.asdict(res)), mimetype="application/json"), 202
except OSError:
return Response(dumps({"message": "Database or table does not exist.", "status": 404}),
mimetype="application/json"), 404
rest_server_port = 5000
......
......@@ -7,25 +7,61 @@ consumes:
produces:
- "application/json"
parameters:
- in: "body"
name: "body"
description: "to-do description"
- name: filename
in: query
required: true
example: filename_s3_key
schema:
type: "object"
$ref: '#/components/schemas/DetermineDataTypesDto'
type: string
- name: separator
in: query
required: true
example: ","
schema:
type: string
- name: enum
in: query
required: false
example: "false"
schema:
type: boolean
- name: enum_tol
in: query
required: false
example: "2.5"
schema:
type: float
responses:
200:
202:
description: Determined data types successfully
content:
application/json:
schema:
$ref: '#/components/schemas/DataTypesDto'
405:
description: "Invalid input"
400:
description: "Failed to determine data types"
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorDto'
404:
description: "Failed to find file in Storage Service"
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorDto'
500:
description: "Unexpected system error"
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorDto'
components:
schemas:
DetermineDataTypesDto:
required:
- filename
- separator
type: object
properties:
enum:
......@@ -44,8 +80,6 @@ components:
type: object
properties:
columns:
type: array
items:
$ref: '#/components/schemas/SuggestedColumnDto'
line_termination:
type: string
......@@ -58,3 +92,12 @@ components:
properties:
column_name:
type: string
ErrorDto:
type: object
properties:
success:
type: boolean
example: false
message:
type: string
example: Message
\ No newline at end of file
tags:
- analyse-endpoint
summary: "Determine primary keys"
description: "This is a simple API which returns the primary keys + ranking of a (path) csv file"
consumes:
- "application/json"
produces:
- "application/json"
parameters:
- name: filename
in: query
required: true
example: filename_s3_key
schema:
type: string
- name: separator
in: query
required: true
example: ","
schema:
type: string
responses:
202:
description: Determined keys successfully
content:
application/json:
schema:
$ref: '#/components/schemas/KeysDto'
400:
description: "Failed to determine keys"
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorDto'
404:
description: "Failed to find file in Storage Service or is empty"
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorDto'
500:
description: "Unexpected system error"
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorDto'
components:
schemas:
KeysDto:
required:
- keys
type: object
properties:
keys:
type: array
items:
properties:
column_name:
type: integer
format: int64
DataTypesDto:
type: object
properties:
columns:
$ref: '#/components/schemas/SuggestedColumnDto'
line_termination:
type: string
example: "\r\n"
separator:
type: string
example: ","
SuggestedColumnDto:
type: object
properties:
column_name:
type: string
ErrorDto:
type: object
properties:
success:
type: boolean
example: false
message:
type: string
example: Message
\ No newline at end of file
tags:
- analyse-endpoint
summary: Determine table statistics
operationId: determine_table_stat
parameters:
- name: database_id
in: path
required: true
example: 1
schema:
type: integer
format: int64
- name: table_id
in: path
required: true
example: 1
schema:
type: integer
format: int64
responses:
202:
description: Determined statistics
content:
application/json:
schema:
$ref: '#/components/schemas/TableStats'
400:
description: "Missing parameters"
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorDto'
404:
description: "Table not found"
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorDto'
components:
schemas:
TableStats:
required:
- columns
type: object
properties:
columns:
type: object
properties:
column_name:
$ref: '#/components/schemas/Stats'
Stats:
type: object
properties:
val_min:
type: float
example: "0.0"
val_max:
type: float
example: "1.0"
mean:
type: float
example: "0.3"
median:
type: float
example: "0.45"
std_dev:
type: float
example: "0.12"
ErrorDto:
type: object
properties:
success:
type: boolean
example: false
message:
type: string
example: Message
\ No newline at end of file
tags:
- analyse-endpoint
summary: Determine statistics
operationId: determinestat
requestBody:
content:
application/json:
schema:
required:
- database_id
- table_id
type: object
properties:
database_id:
type: "integer"
example: 1
table_id:
type: "integer"
example: 1
responses:
"200":
description: Determined statistics
content:
application/json:
schema:
required:
- message
- status
type: object
properties:
message:
type: "string"
example: "Analysed statistical properties"
status:
type: "integer"
example: "200"
400:
description: "Invalid input"
ontent:
application/json:
schema:
required:
- message
- status
type: object
properties:
message:
type: "string"
example: "Analysed statistical properties"
status:
type: "integer"
example: "200"
tags:
- analyse-endpoint
summary: Determine statistics
operationId: determinestats
requestBody:
content:
application/json:
schema:
required:
- filepath
- separator
type: object
properties:
filepath:
type: "string"
example: "file.csv"
separator:
type: "string"
example: ","
responses:
"200":
description: Determined statistics
"400":
description: "Invalid input"
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment