Skip to content
Snippets Groups Projects
Commit ada94dd3 authored by Martin Weise's avatar Martin Weise
Browse files

Merge branch '492-replace-upload-service-with-s3' into 'dev'

Implemented the upload endpoint

See merge request !383
parents ec342a6b 3b44815b
No related branches found
No related tags found
3 merge requests!387Wrong model,!384Wrong model,!383Implemented the upload endpoint
Showing
with 2949 additions and 2201 deletions
......@@ -3,7 +3,6 @@ volumes:
data-db-data:
auth-db-data:
broker-service-data:
upload-service-data:
search-db-data:
identity-service-data:
metric-db-data:
......@@ -305,7 +304,6 @@ services:
environment:
NUXT_PUBLIC_API_CLIENT: "${BASE_URL:-http://localhost}"
NUXT_PUBLIC_API_SERVER: "${BASE_URL:-http://gateway-service}"
NUXT_PUBLIC_UPLOAD_CLIENT: "${BASE_URL:-http://localhost}/api/upload/files"
NUXT_OIDC_PROVIDERS_KEYCLOAK_AUTHORIZATION_URL: "${BASE_URL:-http://localhost}/realms/dbrepo/protocol/openid-connect/auth"
NUXT_OIDC_PROVIDERS_KEYCLOAK_BASE_URL: "${BASE_URL:-http://localhost}/realms/dbrepo"
NUXT_OIDC_PROVIDERS_KEYCLOAK_CLIENT_ID: "${AUTH_SERVICE_CLIENT:-dbrepo-client}"
......@@ -318,8 +316,6 @@ services:
depends_on:
dbrepo-search-service:
condition: service_healthy
dbrepo-upload-service:
condition: service_healthy
healthcheck:
test: curl -fsSL http://127.0.0.1:3000 && curl -fsSL http://127.0.0.1:3000/health
interval: 10s
......@@ -477,36 +473,6 @@ services:
logging:
driver: json-file
dbrepo-upload-service:
restart: "no"
container_name: dbrepo-upload-service
hostname: upload-service
image: docker.io/tusproject/tusd:v2.4.0
volumes:
- "./config/pre-create.sh:/srv/tusd-hooks/pre-create:ro"
command:
- "-behind-proxy"
- "-max-size=2000000000"
- "-base-path=/api/upload/files/"
- "-hooks-dir=/srv/tusd-hooks/"
- "-s3-endpoint=${STORAGE_ENDPOINT:-http://storage-service:9000}"
- "-s3-bucket=dbrepo"
environment:
AWS_ACCESS_KEY_ID: "${S3_ACCESS_KEY_ID:-seaweedfsadmin}"
AWS_SECRET_ACCESS_KEY: "${S3_SECRET_ACCESS_KEY:-seaweedfsadmin}"
AWS_REGION: "${STORAGE_REGION_NAME:-default}"
METADATA_SERVICE_ENDPOINT: "${METADATA_SERVICE_ENDPOINT:-http://metadata-service:8080}"
depends_on:
dbrepo-storage-service:
condition: service_healthy
healthcheck:
test: wget -qO- localhost:8080/metrics | grep "tusd" || exit 1
interval: 10s
timeout: 5s
retries: 12
logging:
driver: json-file
dbrepo-data-service:
restart: "no"
container_name: dbrepo-data-service
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -25,16 +25,20 @@ deployments that by default support replication and monitoring. No graphical use
administrators can access the S3 storage via S3-compatible clients
e.g. [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/s3/) (see below).
### Users
The default configuration creates admin credentials `seaweedfsadmin:seaweedfsadmin`. By default, one bucket `dbrepo` is
created that holds uploads temporarily. It is recommended to delete the contents regularly.
The default configuration creates one user `seaweedfsadmin` with password `seaweedfsadmin`.
The S3 endpoint of the Storage Service is available on port `9000`.
### Buckets
### Filer UI
The default configuration creates two buckets `dbrepo-upload`, `dbrepo-download`:
The storage service comes with a simple UI that can be used to explore the uploaded files, rename them and delete them.
Please note that the Filer UI is not intended for production and should be turned off for security purposes.
* `dbrepo-upload` for CSV-file upload (for import of data, analysis, etc.) from the User Interface
* `dbrepo-download` for CSV-file download (exporting data, metadata, etc.)
<figure markdown>
![Filer UI with a list of uploaded files in the bucket dbrepo](../images/screenshots/storage-service-filer.png)
<figcaption>Figure 1: Filer UI</figcaption>
</figure>
## Limitations
......
---
author: Martin Weise
---
## tl;dr
!!! debug "Debug Information"
Image: [`docker.io/tusproject/tusd:v1.12`](https://hub.docker.com/r/tusproject/tusd)
* Ports: 1080/tcp
* Prometheus: `http://<hostname>:1080/api/upload/metrics`
* API: `http://<hostname>:1080/api/upload`
To directly access in Kubernetes (for e.g. debugging), forward the svc port to your local machine:
```shell
kubectl [-n namespace] port-forward svc/upload-service 1080:80
```
## Overview
We use the [TUS](https://tus.io/) open protocol for resume-able file uploads which based entirely on HTTP. Even though
the Upload Service is part of the standard installation, it is an entirely optional component and can be replaced with
any S3-compatible Blob Storage.
### Architecture
The Upload Service communicates internally with the [Storage Service](../storage-service) (c.f. [Figure 1](#fig1)).
<figure id="fig1" markdown>
![Architecture of the Upload Service](../images/architecture-upload-service.svg)
<figcaption>Figure 1: Architecture of the Upload Service</figcaption>
</figure>
The Upload Service is responsible for uploading files (mainly CSV-files) into a Blob Storage that can be accesses trough
the S3 protocol (e.g. our [Storage Service](../storage-service)). Make sure that the Upload Service can be
accessed from the Gateway Service.
## Limitations
* No support for authentication.
!!! question "Do you miss functionality? Do these limitations affect you?"
We strongly encourage you to help us implement it as we are welcoming contributors to open-source software and get
in [contact](../contact) with us, we happily answer requests for collaboration with attached CV and your programming
experience!
## Security
1. We strongly encourage to limit the clients allowed to upload by adding your subnet, e.g. `128.130.0.0/16`
(=TU Wien subnet) to the [Gateway Service](../system-services-gateway) configuration file like this:
```nginx title="dbrepo.conf"
location /api/upload {
allow 128.130.0.0/16;
deny all;
...
}
```
......@@ -42,6 +42,11 @@ author: Martin Weise
* Replaced sequential numerical ids with non-guessable random ids in the Metadata Database
in [#491](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/issues/491).
#### Removals
* Removed the Upload Service in favor of an internal stable upload endpoint in the Data Service
in [#492](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/issues/492).
## v1.6.5 (2025-02-18)
[:simple-gitlab: GitLab Release](https://gitlab.phaidra.org/fair-data-austria-db-repository/fda-services/-/tags/v1.6.5)
......
.docs/images/screenshots/storage-service-filer.png

122 KiB

......@@ -425,7 +425,7 @@
},
"dbrepo": {
"hashes": [
"sha256:7ba35243c4ead72be2bf2a2d00a3fbbae4a9c7dabb872cca8ed1b1ce77720b5d"
"sha256:2bf2f28f048108191f8e86992004e8727ee4bdeed88076891e66034bcd32d9b0"
],
"path": "./lib/dbrepo-1.7.0.tar.gz"
},
......
No preview for this file type
No preview for this file type
This diff is collapsed.
No preview for this file type
package at.tuwien.endpoints;
import at.tuwien.api.database.ViewDto;
import at.tuwien.api.error.ApiErrorDto;
import at.tuwien.api.file.UploadResponseDto;
import at.tuwien.exception.*;
import at.tuwien.service.StorageService;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.media.Content;
import io.swagger.v3.oas.annotations.media.Schema;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.responses.ApiResponses;
import io.swagger.v3.oas.annotations.security.SecurityRequirement;
import jakarta.validation.constraints.NotNull;
import lombok.extern.log4j.Log4j2;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.multipart.MultipartFile;
import java.io.IOException;
@Log4j2
@RestController
@CrossOrigin(origins = "*")
@RequestMapping(path = "/api/upload")
public class UploadEndpoint extends RestEndpoint {
private final StorageService storageService;
@Autowired
public UploadEndpoint(StorageService storageService) {
this.storageService = storageService;
}
@PostMapping
@PreAuthorize("hasAuthority('upload-file')")
@Operation(summary = "Uploads a multipart file",
description = "Uploads a multipart file to the Storage Service. Requires role `upload-file`.",
security = {@SecurityRequirement(name = "basicAuth"), @SecurityRequirement(name = "bearerAuth")})
@ApiResponses(value = {
@ApiResponse(responseCode = "201",
description = "Uploaded the file",
content = {@Content(
mediaType = "application/json",
schema = @Schema(implementation = ViewDto.class))}),
@ApiResponse(responseCode = "503",
description = "Failed to establish connection with the storage service",
content = {@Content(
mediaType = "application/json",
schema = @Schema(implementation = ApiErrorDto.class))}),
})
public ResponseEntity<UploadResponseDto> create(@NotNull @RequestParam("file") MultipartFile file) throws DatabaseUnavailableException,
DatabaseNotFoundException, RemoteUnavailableException, ViewMalformedException, MetadataServiceException {
log.debug("endpoint upload file, file.originalFilename={}", file.getOriginalFilename());
try {
final String key = storageService.putObject(file.getBytes());
return ResponseEntity.status(HttpStatus.CREATED)
.body(UploadResponseDto.builder()
.s3Key(key)
.build());
} catch (IOException e) {
log.error("Failed to establish connection to database: {}", e.getMessage());
throw new DatabaseUnavailableException("Failed to establish connection to database: " + e.getMessage(), e);
}
}
}
......@@ -126,9 +126,12 @@ public class MetadataServiceGatewayUnitTest extends AbstractUnitTest {
public void getDatabaseById_succeeds() throws RemoteUnavailableException, MetadataServiceException,
DatabaseNotFoundException {
final HttpHeaders headers = new HttpHeaders();
headers.set("X-Host", CONTAINER_1_HOST);
headers.set("X-Port", "" + CONTAINER_1_PORT);
headers.set("X-Username", CONTAINER_1_PRIVILEGED_USERNAME);
headers.set("X-Password", CONTAINER_1_PRIVILEGED_PASSWORD);
headers.set("X-Jdbc-Method", IMAGE_1_JDBC);
headers.set("Access-Control-Expose-Headers", "X-Username X-Password X-Jdbc-Method X-Host X-Port");
/* mock */
when(internalRestTemplate.exchange(anyString(), eq(HttpMethod.GET), eq(HttpEntity.EMPTY), eq(DatabaseDto.class)))
......@@ -225,9 +228,12 @@ public class MetadataServiceGatewayUnitTest extends AbstractUnitTest {
@Test
public void getContainerById_succeeds() throws RemoteUnavailableException, ContainerNotFoundException, MetadataServiceException {
final HttpHeaders headers = new HttpHeaders();
headers.set("X-Host", CONTAINER_1_HOST);
headers.set("X-Port", "" + CONTAINER_1_PORT);
headers.set("X-Username", CONTAINER_1_PRIVILEGED_USERNAME);
headers.set("X-Password", CONTAINER_1_PRIVILEGED_PASSWORD);
headers.set("X-Jdbc-Method", IMAGE_1_JDBC);
headers.set("Access-Control-Expose-Headers", "X-Username X-Password X-Jdbc-Method X-Host X-Port");
/* mock */
when(internalRestTemplate.exchange(anyString(), eq(HttpMethod.GET), eq(HttpEntity.EMPTY), eq(ContainerDto.class)))
......
......@@ -63,7 +63,7 @@ public class MetadataServiceGatewayImpl implements MetadataServiceGateway {
log.error("Failed to find container with id {}: service responded unsuccessful: {}", containerId, response.getStatusCode());
throw new MetadataServiceException("Failed to find container: service responded unsuccessful: " + response.getStatusCode());
}
final List<String> expectedHeaders = List.of("X-Username", "X-Password", "X-Jdbc-Method");
final List<String> expectedHeaders = List.of("X-Username", "X-Password", "X-Jdbc-Method", "X-Host", "X-Port");
if (!response.getHeaders().keySet().containsAll(expectedHeaders)) {
log.error("Failed to find all container headers");
log.debug("expected headers: {}", expectedHeaders);
......@@ -75,6 +75,8 @@ public class MetadataServiceGatewayImpl implements MetadataServiceGateway {
throw new MetadataServiceException("Failed to find container with id " + containerId + ": body is empty");
}
final ContainerDto container = metadataMapper.containerDtoToContainerDto(response.getBody());
container.setHost(response.getHeaders().get("X-Host").get(0));
container.setPort(Integer.parseInt(response.getHeaders().get("X-Port").get(0)));
container.setUsername(response.getHeaders().get("X-Username").get(0));
container.setPassword(response.getHeaders().get("X-Password").get(0));
container.getImage().setJdbcMethod(response.getHeaders().get("X-Jdbc-Method").get(0));
......@@ -101,7 +103,7 @@ public class MetadataServiceGatewayImpl implements MetadataServiceGateway {
log.error("Failed to find database with id {}: service responded unsuccessful: {}", id, response.getStatusCode());
throw new MetadataServiceException("Failed to find database: service responded unsuccessful: " + response.getStatusCode());
}
final List<String> expectedHeaders = List.of("X-Username", "X-Password", "X-Jdbc-Method");
final List<String> expectedHeaders = List.of("X-Username", "X-Password", "X-Jdbc-Method", "X-Host", "X-Port");
if (!response.getHeaders().keySet().containsAll(expectedHeaders)) {
log.error("Failed to find all database headers");
log.debug("expected headers: {}", expectedHeaders);
......@@ -113,6 +115,8 @@ public class MetadataServiceGatewayImpl implements MetadataServiceGateway {
throw new MetadataServiceException("Failed to find database with id " + id + ": body is empty");
}
final DatabaseDto database = response.getBody();
database.getContainer().setHost(response.getHeaders().get("X-Host").get(0));
database.getContainer().setPort(Integer.parseInt(response.getHeaders().get("X-Port").get(0)));
database.getContainer().setUsername(response.getHeaders().get("X-Username").get(0));
database.getContainer().setPassword(response.getHeaders().get("X-Password").get(0));
database.getContainer().getImage().setJdbcMethod(response.getHeaders().get("X-Jdbc-Method").get(0));
......
......@@ -13,6 +13,8 @@ import java.util.List;
public interface StorageService {
String putObject(byte[] content);
/**
* Loads an object of a bucket from the Storage Service into an input stream.
*
......
......@@ -8,15 +8,18 @@ import at.tuwien.exception.StorageUnavailableException;
import at.tuwien.exception.TableMalformedException;
import at.tuwien.service.StorageService;
import lombok.extern.log4j.Log4j2;
import org.apache.commons.lang3.RandomStringUtils;
import org.apache.spark.sql.*;
import org.apache.spark.sql.catalyst.ExtendedAnalysisException;
import org.apache.spark.sql.types.StructField;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.InputStreamResource;
import org.springframework.stereotype.Service;
import software.amazon.awssdk.core.sync.RequestBody;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.model.NoSuchKeyException;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
import software.amazon.awssdk.services.s3.model.S3Exception;
import java.io.*;
......@@ -43,6 +46,18 @@ public class StorageServiceS3Impl implements StorageService {
this.sparkSession = sparkSession;
}
@Override
public String putObject(byte[] content) {
final String key = "dbr_" + RandomStringUtils.randomAlphanumeric(96)
.toLowerCase();
s3Client.putObject(PutObjectRequest.builder()
.key(key)
.bucket(s3Config.getS3Bucket())
.build(), RequestBody.fromBytes(content));
log.debug("put object in S3 bucket {} with key: {}", s3Config.getS3Bucket(), key);
return key;
}
@Override
public InputStream getObject(String bucket, String key) throws StorageNotFoundException,
StorageUnavailableException {
......
......@@ -36,10 +36,6 @@ upstream ui {
server ui:3000;
}
upstream upload {
server upload-service:8080;
}
upstream dashboard-service {
server dashboard-service:3000;
}
......@@ -105,14 +101,12 @@ server {
}
location /api/upload {
# allow 128.130.0.0/16;
# deny all;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $host;
proxy_pass http://upload;
proxy_pass http://data;
proxy_read_timeout 90;
# Disable request and response buffering
proxy_request_buffering off;
......
package at.tuwien.api.file;
import com.fasterxml.jackson.annotation.JsonProperty;
import jakarta.validation.constraints.NotBlank;
import lombok.*;
@Getter
@Setter
@ToString
@Builder
@EqualsAndHashCode
@AllArgsConstructor
@NoArgsConstructor
public class UploadResponseDto {
@NotBlank
@JsonProperty("s3_key")
String s3Key;
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment