EOSC Synergy Services and Tools#
Within EOSC Synergy, we have used, integrated and developed several tools. Some tools existed already, others were extended or developed from scratch. All of the tools have in common that they proved to be useful for the integration of the thematic services supported by the project. This chapter presents these useful tools.
Since all services are open source, their interfaces are open and may be used with several different tools. The tools presented here present a subset of possible solutions. They are not meant to be exclusive. All tools may benefit from combining them with others.
Overview#
This table shows an overview about the tools presented in this handbook
HPC | Cloud Compute | Storage | AAI | Q/A | Training | Generic | |
---|---|---|---|---|---|---|---|
udocker | ✓ | ✓ | |||||
SLURM | ✓ | ✓ | |||||
Infrastructure Manager | ✓ | ✓ | |||||
Fedcloud Client | ✓ | ✓ | ✓ | ||||
Dynamic DNS | ✓ | ✓ | ✓ | ||||
EOSC Performance | ✓ | ✓ | ✓ | ||||
Cinder | ✓ | ||||||
Swift | ✓ | ||||||
RClone | ✓ | ||||||
EGI DataHub | ✓ | ||||||
B2Share | ✓ | ||||||
Core AAI/IAM | ✓ | ||||||
oidc-agent | ✓ | ||||||
mytoken | ✓ | ||||||
ssh-oidc | ✓ | ||||||
flaat | ✓ | ||||||
Vault | ✓ | ||||||
Pipeline as a Service | ✓ | ||||||
Quality Badges | ✓ | ||||||
Learn@Synergy | ✓ | ||||||
Online Training Platform | ✓ | ||||||
Video Conferencing Tool | ✓ | ||||||
Cloud Storage | ✓ | ||||||
Training Infra Mgmt | ✓ | ||||||
Jupyter Notebooks | ✓ | ||||||
Hackathon as a Service | ✓ | ||||||
Service Management | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Monitoring | ✓ | ✓ | ✓ | ✓ | ✓ | ||
Accounting | ✓ | ✓ | ✓ | ✓ | ✓ |
Services and tools for Cloud computation on EGI Federated Cloud#
Infrastructure Manager#
Infrastructure Manager is a tool that eases the access and the usability of cloud infrastructures by automating Virtual Machines Instances (VMI) selection, deployment, configuration, software installation, monitoring and update of Virtual Appliances. It supports APIs from a large number of virtual platforms, making user applications cloud-agnostic.
Infrastructure Manager is intensively used by Thematic services in EOSC-Synergy. The service was significantly improved during the project based on feedback from users. In addition, several new recipes for EOSC-Synergy services were developed and added to the dashboard.
Links:
Openstack Dashboard#
OpenStack Horizon is a web-based graphical interface that users can access to manage OpenStack compute, storage and networking services. It allows service administrators to use this Dashboard to launch virtual machine instances, storage volumes or even manage their networks.
On top of this, EOSC-Synergy developed a dashboard that allows accessing all participating sites from one dashboard. The dashboard becomes the central web-based GUI interface for managing resources on all OpenStack sites in the project.
Links:
FedCloud client#
The FedCloud client is a command-line client designed for interacting with OpenStack services in the EGI infrastructure. The client can access various EGI services and perform many tasks for users. It includes managing access tokens, listing services, and command execution on OpenStack services located in the EGI Cloud infrastructure. The client was developed during the EOSC-Synergy project and has become the official client for EGI Cloud infrastructure.
FedCloud client is designed for using in shell scripts or Python programs. That enables sophisticated ways to automate tasks interfacing the cloud infrastructure. Complex tasks like listing all virtual machines owned by a user on all OpenStack sites in EGI Cloud infrastructure can be easily completed by simple scripts using FedCloud client.
Links:
Dynamic DNS#
The Dynamic DNS service provides a dynamic Domain Name System (DNS) service for EGI Cloud infrastructure. Users can register their own meaningful and memorable host names, using a list of provided domains (e.g. fedcloud.eu, eosc-synergy.eu) and assign to public IPs of their servers hosted in EGI Federated Cloud. Simple login using EGI Check-in allows registering your own hostnames.
By using Dynamic DNS you can host services in EGI with meaningful service names and freely move their virtual machines (VMs) between sites without modifying configurations (federated approach). The hostnames also enable the services to get SSL certificates for improving security and privacy.
Some domains dedicated for EOSC-Synergy were added to the service for supporting Thematic services: o3as.fedcloud.eu, repository.fedcloud.eu, vm.fedcloud.eosc-synergy.eu, worsica.fedcloud.eosc-synergy.eu. The Dynamic DNS service is also integrated with Infrastructure Manager for deploying thematic services with registered hostnames from Dynamic DNS.
Links:
EOSC Performance#
EOSC-Performance is a search-and-compare platform where you can upload and search through results from multiple benchmarks. By comparing the data acquired from benchmarks, you can evaluate and decide which computing infrastructure provider would give the best performance for your applications. The service is developed and supported by the EOSC-Synergy project.
For computing infrastructure providers, they can also submit new entries so users can find their services. The interface to the platform can be done through a web based Graphical User Interface (GUI) or through an API in case you want to automate or integrate the data with your project.
Links:
Services and tools for authentication#
AAI / IAM Services#
The cloud infrastructure relies on services that provide the Authentication and Authentication Infrastructure (AAI). More specifically, so-called “Community AAIs” as defined in the AARC Blueprint Architecture (BPA) are required for users to log into the EOSC services. This is fully in line with the EOSC Architecture. The EGI Federated Cloud uses EGI Checkin (https://aai.egi.eu) as its infrastructure proxy. This enables a large number of communities to use the Cloud. Within EOSC Synergy, we have successfully used the EGI, the GEANT eduTEAMS (https://eduteams.org), and the EUDAT B2access (https://b2access.eudat.eu) services for our users.
oidc-agent#
oidc-agent is a set of tools to manage OpenID Connect tokens and make them easily usable from the command line. It follows the ssh design, so users can handle OIDC tokens in a similar way as they would do with ssh keys. If users are using or designing an API which relies on OIDC authentication like accessing OpenStack sites with the FedCloud client mentioned above, these tools will come really handy to them.
Links:
mytoken#
OIDC tokens are a very handy and secure way to handle user identification and authorisation between systems, especially in a federated environment such as EOSC. However, their short life is a problem on tasks where the execution time can be longer than the expiration time of the token. Such tasks are not rare in a scientific community such as EOSC. To solve these issues, mytoken was developed to provide OIDC Access Tokens for example to long-running compute jobs.
Mytoken is a web service to obtain OpenID Connect Access Tokens in an easy but secure way for extended periods of time and across multiple devices. Mytoken focuses on integration with the command line through a command line client but also offers a web interface for users who prefer managing their tokens with a browser. If you like oidc-agent and you need to execute long lived tasks on cloud or HPC, this tool is definitely for you.
Links:
ssh-oidc#
ssh-oidc consists of a set of tools that allows (you guessed it) ssh with OIDC. This tool allows you to authenticate and log in to remote machines using your institution (or any other organisation) credentials instead of using a secret key or password.
Focused on usability, ssh-oidc is divided around several tools and libraries to mimic a subset of the popular ssh capabilities. The two main components are an SSH client wrapper: mccli designed to run on clients computers and the service motley-cue for mapping OIDC identities to local identities (to be run on the server where users are planned to log in).
Links:
Python API for AAI in services: flaat#
Flaat is a simple python library that allows a straightforward implementation of REST interfaces that are well integrated with the AAI.
By using decorators, individual functions can be protected, so they may only be accessed by authorised users. Authorisation may be limited to VO and Group Membership as well as to the Assurance of a users identity.
Links:
Vault Secrets Manager#
Applications in EGI Infrastructure may need different secrets (credentials, tokens, passwords, etc.) during deployments and operations. The secrets are often stored as clear texts in configuration files or code repositories that expose security risks. Furthermore, the secrets stored in files are static and difficult to change/rotate. The secret management service for EGI Infrastructure is developed to solve the issues.
Links:
Services and tools for Cloud storages#
OpenStack Cinder#
If your cloud is hosted on an infrastructure managed by OpenStack, this type of storage will be the easiest you can access and use. It is available via OpenStack dashboard and will look just like a harddrive in your VM. However, note with this solution only the users and services with access to your VM can access the storage folder.
If you need to provide access to data to external users but you do not want to provide VM access, you probably have to look for another alternative. However, you can still use Cinder to extend your VM storage or combine it with other solutions providing the interface you like the most (e.g. Nextcloud).
OpenStack Swift#
Another solution by OpenStack. If you need fast data access with infinite scalability (no need to reshape volumes) you probably should look into Object Storage technology. Swift (OpenStack) is the storage alternative to Cinder and probably one of your better options to work with object storage.
There are multiple ways to access Swift storage, however they might not be so intuitive. If you decide to use rclone to synchronise your filesystem with Swift, the tool EGI Swift Finder can implement all the discovery and configuration for you.
RClone#
Rclone is a command line program to manage files on cloud storage. It is a feature rich alternative to cloud vendors' web storage interfaces. Over 40 cloud storage products support rclone including S3 object stores, business & consumer file storage services, as well as standard transfer protocols.
Rclone mounts any local, cloud or virtual filesystem as a disk on Windows, macOS, linux and FreeBSD, and also serves these over SFTP, HTTP, WebDAV, FTP and DLNA. Authentication and Authorisation will depend on the protocol you choose.
To facilitate the use of RClone, which was developed in a different context, a utility program “EGI Swift Finder” that sets up the environment for the EOSC context was developed. Links:
Nextcloud#
Nextcloud is a suite of client-server software for creating and using file hosting services. Software is free and open-source making anyone allowed to install and operate it on their own private server devices. Manage and access your files knowing your data is in your data centre, on a server managed by you or your team, rather than floating somewhere in the cloud. It is simple to install and deploy, for example in one of your hosts at the Federated Cloud.
Nextcloud is designed to be accessed via the web interface and WebDAV. Authentication via EGI Check In currently only works for the web interface. To use WebDAV and other protocols, currently passwords or OAuth2 Tokens have to be created in the web interface, before they can be used on the commandline.
Links:
EGI DataHub#
A data management solution trying to provide High-performance with unified data access across globally distributed environments. If you have a very distributed cluster that your services need to access, this is probably an option for you.
The data organisation and sharing is similar to a filesystem, users organise their data in virtual volumes called spaces and share access between groups. To access your data you have multiple options such as web interface, CLI (command-line interface) or an API. Authentication and authorisation are based on OpenID Connect and SAML, supporting as well the usage of tokens at API level.
Links:
B2Share#
B2SHARE is a user-friendly, reliable and trustworthy way for researchers, scientific communities and citizen scientists to store, publish and share research data in a FAIR way. B2SHARE is a solution that facilitates research data storage, guarantees long-term persistence of data and allows data, results or ideas to be shared worldwide. B2SHARE supports community domains with metadata extensions, access rules and publishing workflows. EUDAT offers communities and organisations customised instances and/or access to repositories supporting large datasets.
To manage your data there is a web interface and HTTP API. Authentication and authorisation are based on password or OIDC, using access tokens in the case of the API. Note that EUDAT encourages FAIR principles, so double check the privacy of your data (e.g. Metadata is always publicly available).
Links:
Services and tools for HPC#
udocker#
udocker is a basic user tool to run simple docker containers in user space without requiring root privileges. It supports download and execution of docker containers by non-privileged users in Linux systems where docker is not available. It can be used to pull and execute docker containers in Linux batch systems and interactive clusters that are managed by other entities such as HPC and Grid infrastructures or externally managed batch or interactive systems.
udocker does not require any type of privileges nor the deployment of services by system administrators. It can be downloaded and executed entirely by the end user. The limited root functionality provided by some of the udocker execution modes is either simulated or provided via user namespaces. udocker is a wrapper around several tools and libraries to mimic a subset of the docker capabilities including pulling images and running containers with minimal functionality.
The performance of udocker beats - depending on the execution mode - most other container execution engines.
Links:
SLURM#
The Slurm Workload Manager is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.
It provides three key functions:
-
Allocate access to resources to users for specified durations of time so they can perform work
-
Provide a framework for starting, executing, and monitoring work, typically a parallel job such as Message Passing Interface (MPI) on a set of allocated nodes, and
-
Arbitrating contention for resources by managing a queue of pending jobs.
Slurm is the workload manager on about 60% of the TOP500 supercomputers. It was used by the thematic services to distribute their workloads -- either on HPC machines or on Clusters composed from Virtual Machines on the Federated Cloud.
Links:
Platform for Software and Services for Quality Assurance#
The adoption of quality-based practices is one common challenge when it comes to developing software, especially in research environments. The SQAaaS platform provides researchers with a modular platform. It provides a variety of modules, all targeted at improving and assessing the quality of software.
The SQAaaS platform is based on abstract Quality Criteria for Software on the one hand (SQA) and on Services on the other hand (SerQA). Consequently, the SQAaaS platform implements the tools and pipelines that allow the verification of such criteria.
Here we briefly outline a subset of the modules of the SQAaaS platform. The full details are available in [D3.2] and [D3.4].
Pipeline as a Service#
Pipeline as a Service is provided via the tool JePL (https://github.com/indigo-dc/jenkins-pipeline-library), which is a library to implement Software Quality Assurance (SQA) checks in Jenkins environments. It is meant to make it easier to configure the SQAaaS pipelines without knowing the Jenkins syntax. For this it provides a more simple solution to adopt a DevOps development practice by leveraging the YAML language to describe the criteria from the QA baselines to be assessed. A guided graphical process exists to create an initial configuration for a pipeline.
Quality Badges#
The Quality Assessment and Awarding module analyses the compliance of a code repository with the quality baselines, and issues digital badges to certify if a minimum set of quality is achieved. Services and Software have different quality baselines, both of which are defined in [D3.4 section 5] Badges are issued as “gold”, “silver”, or “bronze”.
Services for online training#
EOSC Synergy WP6 provides several tools dedicated to training.
Learn@Synergy#
Learn@Synergy: a classic wordpress website for basic instructions and links to services, suc as catalogue of courses and training materials: https://learn.eosc-synergy.eu/
Online Training Platform#
Online training platform: Based on the Moodle platform. It provides interactive courses with user forums to support community interactions among students and tutors as well as immediate feedback: https://moodle.learn.eosc-synergy.eu/
Video Conferencing Tool#
Video Conference service to connect, talk or share the screen with other people. It only takes a minute to set up a new room and send invitations to the meeting: https://vc.learn.eosc-synergy.eu
Cloud Storage#
Shared drive is a cloud space for users to securely store and synchronise files. The service is based on the open source NextCloud software and it is integrated with the AAI: https://drive.learn.eosc-synergy.eu
Training Infrastructure Management#
Training Infrastructure Management service that allows self-deployment of cloud training infrastructure for a given training. It allows managing the virtual machines and accounts for training participants. The service is based on the Infrastructure Manager(IM) software, which deploys complex and customised virtual infrastructures on multiple back-ends.
Jupyter Notebooks#
Jupyter Notebooks for Interactive computing which allows service developers to make use of interactive training technologies such as Jupyter notebooks.
Hackathon as a Service#
Hackathon as a service (HaaS): is a platform that has been created within this project to facilitate the organisation of hackathons taking advantage of the EOSC infrastructure and accessible through the EOSC Portal. A hackathon is a sprint-like event in which computer programmers and others involved in software development (UI and graphic designers or project managers) collaborate intensively on software projects with the goal of creating a functioning product by the end of the event
More details on the online training services can be found in the corresponding deliverable of the Workpackage 6
Monitoring and Accounting Services#
Service Management#
Service management is traditionally done using the “Grid Operations Configuration Management Database (GOCDB)”. It provides a repository, portal and REST style API for managing Grid and Cloud topology objects such as sites, services, or downtimes. GOC is a central tool for IT service management, where all relevant information about participating computer centres is kept.
Monitoring#
Site and service availability is monitored in the monitoring service ARGO. It deploys and runs checks against the infrastructure and collects information from low level items (hosts, services) to higher abstractions (groups, organisations). The monitoring data pass through an analytics engine to generate rich reports. The EOSC Synergy project created its own service level agreement with EGI, and is therefore present in ARGO as one group: https://argo.egi.eu/egi/dashboard/SLA/EGI_EOSCSYNERGY_SLA
Accounting#
Accounting collects usage information of many different services inside the infrastructure. To confirm with privacy regulation, data is collected on the level of Virtual Organisations. It helps to assess how much storage, CPU hours, or Virtual Machines are used by any given Virtual organisation. Here we link to the use of EOSC Synergy resources throughout the project lifetime: https://accounting.egi.eu/