Thematic Services#
EOSC-Synergy is supporting ten different thematic services in four scientific areas (Earth Observation, Biomedicine, Astrophysics and Climate Change see here for a detailed list). Each service corresponds to a separate Community, each of which has different requirements, software tools, and access patterns. All of these thematic services require access to infrastructure services such as CPU, Storage and Network. The way in which the infrastructure is allocated, accessed, and used is different between each thematic service, though.
Many services provide access to research data and benefit from a wider availability. As such they need to act as clients and servers simultaneously. Often, the underlying user management (also called AAI) is shared for both roles.
The thematic services serve as examples, since they addressed a larger number of issues than many other services. This gives us the chance to either prove that EOSC-Synergy is ready to access federated datasets, in clusters distributed across Europe, or to develop additional tools for the ecosystem in case they are still missing.
During the project lifetime the communities have progressed towards best practices for the adoption of common EOSC guidelines, tools, interfaces and services. This includes strengthening the communities in increasing the capacity, performance, reliability and/or functionality of these thematic services through their integration in EOSC. This was especially important to increase the number of users of these thematic services substantially.
A detailed report will be published by EOSC-Synergy’s Work Package 4 (WP4), that describes requirements and solutions of the thematic services in detail. [WP4-IS]
Thematic Services Challenges#
Here we provide a short overview about the specific challenges faced by each thematic service. These challenges regard access to Computing and Storage specifically. For more information about each specific thematic service, we provide references to a publication that describes the service in more detail. The following information have originally been collected in the paper “A survey of the European Open Science Cloud services for expanding the capacity and capabilities of multidisciplinary scientific applications” by Ignacio Blanquer et. al. [WP4-IS]
Thematic Service | Limitations and needs |
---|---|
WORSICA | - Improve download speed and number of concurrent downloads of satellite images. |
- Increase storage of the images needed for the algorithm. | |
- Increase computational resources: GPU and RAM to speedup the image processing. | |
- Seamless authentication and authorization for end users. | |
SAPS | - Need for a larger-scale deployment: computing, storage and data access. |
- Scalability and standardisation of services | |
- Integrated and widely supported AAI | |
GCore | - Overcome limited access to data repository due to network bandwidth restrictions. |
- Infrastructure resources for processing and reprocessing large data sets. | |
- Data delivery volume. Increasing size of files to be delivered to users. | |
SCIPION | - Insufficient Cloud resources for the workflow: GPUs, CPUs and RAM |
- Need of a Resource Management able to optimize the use of cloud resources. | |
- Storage limitations and data transfer performance: 1-3 TB raw data. | |
- Distributed and shared file system. | |
OpenEBench | - Need to work on heterogeneous systems to reach Life Sciences Communities |
- Need to efficiently store processed data and workflows in a FAIR manner. | |
LAGO | - Limitations on data preprocessing. |
- Needs data storage that copes with FAIR, curation and harvesting; | |
- Need for computing power for simulations, together with optimal scheduling. | |
SDS-WAS | - Lack of services needed for Data storage and curation. |
- Lack of computing power for data analysis on-demand. | |
- Lack of reliability of data sources, especially about observations | |
UMSA | - Long-term data storage is required, together with appropriate data curation. |
- Tracking provenance of the secondary (derived) datasets. | |
- Need for reimplementing UMSA algorithms to deal with sparse data. | |
MSWSS | - Needs data protection measures because of the usage of confidential data. |
- The data has to be stored in a private storage only. | |
- Implement security policies to protect VMs. | |
O3AS | - Requires larger storage resources, specially improving data availability |
- Fast handling of big data |
Table 1: The Thematic services with their challenges that need to be addressed in EOSC Synergy
Thematic service technology choices#
To address the identified challenges, WP4 of EOSC-Synergy undertook an analysis of the Services offered via the EOSC Marketplace. More than 320 services are available. In [WP4-IS] these services are organised into six categories, out of which “Access physical & eInfrastructures” is the one we are interested in. Table 2 shows those services chosen by each service to address the needs within the different categories.
More details are available in the corresponding WP4 Deliverable [D4.3].
Service | AAI | Workload Mng. | Resource Mng. | Data Storage |
---|---|---|---|---|
WORSICA | EGI Check in | ArcCE, Batch (SLURM) | IM (TOSCA) | Nextcloud, Datavers |
G-Core | CAS User/pwd & EGI Check in | GCore+ K8s | IM / EC3 | ElasticSearch |
SAPS | EGI Check in | K8s | IM / EC3 | OpenStack Swift |
Scipion | EGI Check in | Batch (SLURM) | IM / EC3 | Local + EGI DataHub |
OpenEBench | Life Sciences AAI | WfExS + NextFlow | OpenNebula | Local + B2SHARE |
LAGO | eduTEAMS + EGI Check-in | Batch (SLURM) | Local clusters + IM / EC3 | EGI DataHub ONEDATA |
SDS-WAS | B2ACCESS | Batch (SLURM) | Local clusters | B2HANDLE / B2SAFE |
UMSA | EGI Check in & Life- science AAI | Batch (SLURM) in IM/EC3 (in Galaxy) | IM / EC3 | Local + S3 |
MSWSS | EGI Check in | Batch (SLURM) in EC3 (in Galaxy) | IM / EC3 | Local + Dataverse |
O3AS | EGI Check in | Batch (SLURM) & K8s | cluster | Local + WebDAV |
Table 2: The solutions used by thematic services in the different domains. (from [D4.3])