Resources Integration (for managers)#
This chapter addresses the aspects that need to be considered when a computer centre wants to join a part of their cloud resources with EOSC. After motivating why joining resources to the federated cloud, we describe the general access policies, the trust model, and provide technical pointers for technical staff to implement the actual integration.
The focus here is less on the hardware, but rather on the realisation that raw computing capacity is not really an asset without adequate means to access it and make it easily available to scientists. What delivers added value is the know-how about using and adapting these facilities to multiple tasks or scientific domains. Power is nothing without control.
In EOSC, the strong move towards harmonisation of infrastructures, (similar to Amazon AWS or Google GCS), leads to a unification of interfaces and access patterns. In EOSC these are tailored for the needs and the culture of scientific work.
Consequently, in EOSC-Synergy we adopted this trend of bringing users and providers together. The tasks of WP2 were to adapt the computing facilities to the requests of the Thematic Services. WP4 adapts the Thematic Services to the computing infrastructures.
Why should computer centres join the EOSC ecosystem#
In the spirit of building an adequate infrastructure for science in Europe, we shortly present reasons for computer centres to join forces under the EOSC umbrella. Starting with a different perspective, of what would be the alternatives to joining a federated cloud.
One option is to buy the necessary capacity at commercial cloud vendors which is often a lot more expensive, especially, when it comes to longer term storage and data transfers. These points are important when FAIR data and OpenScience are taken seriously. Furthermore, the commercial support model does not include serving specialised requests which are sometimes required for significant advances. An endeavour like WLCG seems unrealistic to rely entirely on AWS, GCS or a Telekom Cloud.
Another option would be the continuation of the traditional model of providing custom solutions on site. This model ties users into custom solutions each of which is different from computer centre to computer centre. While this may be necessary for the optimal allocation of some HPC machines, the general drawback is that cost for both learning and supporting the individual solutions is too large in comparison to the EOSC offer.
On the contrary, there are many good reasons for computer centres to support EOSC.
Als already mentioned, an evolving ecosystem of tools that can readily be used with EOSC is available and growing. The general open source nature of the tools, and of the infrastructure, are one key element to ensure the long-term availability and extensibility of the ecosystem. The same access pattern can be used to allocate compute resources (CPU, Block/Object Storage, Network, Archive, ...) at any centre that is part of EOSC. The synergies of this ecosystem (unified user-support, software reusable across many places, identical policies and procedures) strongly outweigh the initial investment of time to join infrastructure resources with EOSC.
Use-cases that benefit from EOSC span the whole range from classic number-crunching including specialised hardware like GPUs often used for training Artificial Intelligence networks one one end of the spectrum, to web-servers that publish scientific results or that merely serve information pages.
An obvious benefit of the federated nature of EOSC is for example that critical services may be operated at different cloud sites for increased fault-tolerance. If these services reside in different centres in different countries or continents is merely a deployment detail.
Authorisation is definitely among the most important topic in the federated EOSC world. And of course computer centres that join EOSC will retain full control about who may use their resources or not. The authorisation model is based on the concept of Virtual Organisations (VOs). Access to resources is granted based on membership in a VO. The decision of which VOs are supported is with the provider of the resources. This corresponds to the concept of computing-proposals in HPC, where a successful proposal is allocated an amount of CPU-time. In most cases these proposals are assigned a group, which may consist of multiple members and is administered by the Principal Investigator (PI) of the computing-proposal. VOs work in exactly the same way, but they may be supported at multiple different EOSC sites at the same time. Details about the EOSC-AAI can be found in the “EOSC Authentication and Authorization Infrastructure (AAI)”, published by the European Commission:
Monitoring and Accounting is another important tool within scientific computing. EOSC interfaces to several systems and collects them centrally. The collected data are published at the central Accounting Portal. The granularity is per VO to respect the privacy of individual users.
The Unified set of policies within EOSC organised and regulates the responsibilities of users, VO managers, site owners, etc. An international team of experts has collected policies from partners around the world, and structured them clearly, and in a flexible way in the so-called Policy Development Kit. These policies reflect the best of breed of what is used in production practice by many large infrastructures for decades. They are designed to provide fully GDPRS compliant templates. In addition, the flexibility allows individual computer centres to add specific clauses for users to accept, when using services from that site. Details about the technical policies will be given in subsection 4.2.2.
The following list summarises the benefits of joining EOSC
-
Extended scalability (beyond the size of one local cloud), using the exact same EOSC compatible interfaces to access remote resources provided by other participants in EOSC.
-
Extended availability for critical services that may be operated at different cloud centres in different countries.
-
Access via a modern federated AAI (also called Identity and Access Management, IAM, in industry), offering stringent security at a low cost of operation.
-
A large ecosystem of tools that is tailored to work on the federated EOSC infrastructure. Example: Creation of a dynamically scaling kubernetes cluster with only a few mouse clicks. (The full set of tools is described in section 5.)
-
Included monitoring and accounting
-
Efficient communication and user-support by clear separation between hardware providers, technology development and end-users.
-
Professional service management procedures
Policies and concept for access to resources#
National case studies#
Case studies of the countries Czech Republic, Poland, Slovakia, Spain, The Netherlands, and the UK have been conducted as part of WP5 in EOSC-Synergy [D5.1, D5.2, D5.3]. The first document analyses the structure of each country regarding the stream of funding for research infrastructures, cloud computing sites are typically part of. The 2nd document provides recommendations to improve the uptake of EOSC and to extend the available infrastructure resources, while the last document analyses the impact of the recommendations given.
The bottomline of the recommendations given in [D5.2] regarding EOSC capacity extension are:
-
Raising awareness of EOSC in general is necessary, including its Rules of Participation, related areas of Open Science such as PIDs, FAIR principles, etc.
-
Creation and adoption of national policies for FAIR data should be supported
-
Define roadmap/strategy and structural funding to guarantee stability/continuity of vital EOSC-related services
-
Motivate the researchers, for example by adopting the system of giving credit to research by not only honouring the traditional publication, but also other scientific resources, including data and software
-
Introduce a stable funding model independent from the projects
-
Creation of the uniform governance model on the national level. Public governmental data services might be more integrated into the EOSC ecosystem
-
Increase awareness of EOSC within the community
-
Encourage national research agencies to contribute to the development of EOSC activities
-
Disseminate successful results of EOSC applications, for example those virtual organisations dedicated to specific scientific disciplines, which can accelerate the adoption of EOSC infrastructure by national initiatives
-
Increase awareness of EOSC in the scientific community
-
Facilitate the registration of data and services, the allocation of resources to support the services, the verification of the quality and the adherence to the FAIR principles by providing tools, examples, tutorials and support teams.
-
Leverage the highly distributed nature of the research infrastructures
-
Make sure that the funding system for research sets apart enough structural funds for the continuity of Open Science support services
-
Adapt the system of giving credit to research by not only honouring the traditional publication, but also other research outputs, including data and software
Access Policies#
The policy on who can access the infrastructure with which share is entirely up to the respective owners of the hardware. Participating sites own the resources, hence they are in control. The technology used to enforce the authorisation decision is the Virtual Organisation (VO) model. VO managers are responsible for getting their VO authorised to use a given quota at every single site. As part of the agreement to support a given VO, a site may request specific agreements between the VO and the site. This may for example include specific Acceptable Use Policies (AUP) to be agreed upon by each user before being accepted as a VO member.
EGI defines three different types of access policies, that reflect the above:
-
Policy-based access: Users are granted access based on policies defined by the EGI resource providers or by the EGI Foundation; such policies usually apply to resources being offered with “sponsored use” to meet some national or EU level objective; for instance, a country may offer resources with “sponsored use” to support national researchers involved in international collaborations.
-
Wide access: Users can freely access scientific data and digital services provided by EGI resource providers.
-
Market-driven: Users can negotiate a fee to access services either directly with EGI resource providers or indirectly with the EGI Foundation.
Within these definitions, services allowing access to rival resources (e.g. computing capacity or storage space) are usually provided under a policy-based or market-driven access policy. Services allowing access to non-rival resources (e.g. software packages or scientific data) are usually provided under a wide access policy. All access policies may not be available for each and every resource, service or scientific data set.
Technical Infrastructure policies#
The legal relationships between the different stakeholders in the Infrastructure have evolved over the past 20 years to a stable and generally set of policies. These are referred to as “technical” policies in this document, because those were driven by IT security and other technical personnel. Yet, legal consultation took place, to ensure practical applicability of all policies and templates.
The policies are collected in the AARC Policy Development Kit. Note, that some policies are defined as “Policy Frameworks”. These frameworks merely define a list of criteria that need to be addressed by a given policy to conform to the framework. This allows the toleration of differences between policies in different countries or infrastructures. Table 3 provides an overview about the different policies, by whom they are defined, and to whom they apply.
Policy Frameworks#
The following frameworks are considered best practice for Research Communities enabling federated access. They enable trust and promote attribute release from the wider identity federation.
-
Sirtfi Trust Framework Sirtfi demonstrates that an organisation complies with baseline expectations for operational security and incident response in the context of identity federations. To mitigate risk, an Infrastructure may choose to restrict its interactions to only those federated organisations who are able to comply with the framework. As well as the Infrastructure itself supporting Sirtfi, it is highly recommended that each connected service supports Sirtfi.
-
Research and Scholarship Entity Category Research and Scholarship identifies federated services that are operated for the purpose of supporting research and scholarship activity. Identity Providers demonstrate their support for research and scholarship by releasing a defined set of attributes for a user, including name, email address and additional low-risk information that may be useful for their activities [R&S]. It is recommended that entities adopt and use this category since many Identity Providers will not release user attributes to services that do not publish the Research and Scholarship Entity Category. REFEDS provide additional entity categories, such as “Personalized”, “Anonymous” and “Pseudonymous” to cater for additional use-cases and the related attribute requirements.
-
GÉANT Data Protection Code of Conduct The Data protection Code of Conduct (DPCoCo) describes an approach to meet the requirements of the EU Data Protection Directive and (version 2) with the General Data Protection Regulation (GDPR) in federated identity management. The Data protection Code of Conduct defines behavioural rules for Service Providers which want to receive user attributes from the Identity Providers managed by the Home Organisations.
AARC Policy Development Kit#
Here we give an overview over the policies contained in the policy kit, their meaning, purpose, and possible application.
The Policy Kit builds on the Snctfi framework [SNCTFI].
Manage ment | Infrastructure Security Contact | User Community Management | Service Management | User | ||
---|---|---|---|---|---|---|
Top Level | Infrastructure Policy | Defines & Abides by | Abides by | Abides by | Abides by | |
Data Protection | Privacy Statement | Defines | Defines | Views | ||
Policy on the Processing of Personal Data | Defines | Abides by | Abides by | Abides by | ||
Membership Management | Community Membership Management Policy | Defines | Abides by | |||
Acceptable Use Policy | Defines | Defines | Abides by | |||
Acceptable Authen- tication Assurance | Defines | Abides by | Abides by | |||
Operational Security | Incident Response Procedure | Defines | Abides by | Abides by | ||
Service Operations Security Policy | Defines | Abodes by |
Table 3: Overview about the different policies, by whom they are defined, and to whom they apply.
-
The top level Infrastrastructure Policy serves to bind the entire policy set and stipulates requirements on each of the participants; Management, Infrastructure Security Contact, User Community Management, Service Management (including the Proxy Operator) and the User. The top level policy identifies additional policy documents; in this case the five that are mandatory for Snctfi compliance. The Infrastructure may wish to define additional policies, such as Service Eligibility, Disaster Recovery, or Data Management; these policies should be linked into the Infrastructure Policy to ensure a coherent Policy set. Top Level Policy regulates the behaviour and activities of participants in the Infrastructure, and binds all other policies in a coherent whole. It explains the relevant terms, and instructs certain actions to be taken. The Infrastructure must have a Security Officer. All services must have a designated Security Contact. The communities must designate a Security Contact, and must ensure that all Community users will accept and abide by the relevant policies (which are all policies). This can be achieved, for example, by showing an Acceptable Use Policy (AUP) that contains links to all Infrastructure Policies. Naturally, this can technically be done by the Infrastructure’s services.
-
Membership Management Policy is a set of rules for the Community on how User membership should be managed. The Community must define an AUP. The template is provided. The Community must properly manage their users’ membership life cycle, and must record all actions conducted on it. All the outlined actions must be followed (i.e. rules for Registration, Assignment of Attributes, Renewal, Suspension, Termination). The Community must take actions to ensure proper data protection and auditability.
-
Acceptable Authentication Assurance Policy outlines the acceptable authentication assurance for the community, but also for the Infrastructure. The standard way of conveying this information is to use the REFEDS Assurance Framework (RAF]). The Community must define their own Assurance procedures, especially in relation to Identity Vetting. This may depend on the acceptable assurance levels demanded by services, e.g. services may request RAF Assurance Profile Cappuccino, and Community Manager must ensure that it is followed.
-
Acceptable Use Policy defines conditions of usage of Infrastructure resources, but may additionally define rules for the Community itself. At the very least, Community must input their name and purpose. The Community may reuse the Infrastructure policy, if that is enough for them.
-
Policy on the Processing of Personal Data outlines that proper measures must be taken to protect the personal data of users when using Infrastructure services, but it also instructs the Community to do the same. The Community must accept this policy, and must ensure that, if the Community has services integrated with the Infrastructure, must follow these rules.
-
Privacy Policy Template is a template for all the services to use and follow.
-
Incident Response Procedure is a set of rules to follow in case of a security incident. All Services must follow and abide by this procedure.
These policies and their templates can be found at the AARC Policy website. For EGI Federated Cloud the policies are linked here EGI-Policies. Additionally, there is a Moodle course that serves as an introduction for the PDK, and explains the purpose and usage of policies. The course allows one to organise and systematise the policy writing and implementation with the Infrastructure in order to properly manage users and properly provide services PDK-MOODLE. Everyone that needs to understand or create policies in federated research context is strongly encouraged to take the course. The course is also available as a YouTube playlist PDK-Playlisth
How to join the infrastructure#
The Infrastructure is a complex setup of more and of less well connected services. Less connected services are typically HPC centres and specific Storage facilities at individual computer centres. Often, these are neither connected to the common AAI nor to any joint accounting system. In the spirit of this handbook we refer to the better connected services, where joining the infrastructure is a considerable amount of effort. The subsection is structured into one part for infrastructure providers and one for users.
Infrastructure providers: How to join as a computer centre#
EGI Federated Cloud is a complex and well connected infrastructure. It follows the principles of major IT Service Management standards (FitSM), provides accounting, monitoring, Identity- and VO management, and more. A site willing to provide resources to the EGI Federated Cloud needs to be integrated with a variety of services, so that a minimum level of service quality can be guaranteed to the end-user. All necessary steps and procedures are documented in detail at the EGI Cloud Compute webpage. This integration can be grouped into three categories:
-
Organisational prerequisites:
-
Join your national grid initiative (NGI) to obtain an entry in the GOCDB.
-
Ensure you can support the relevant policies.
-
-
Technical prerequisites: Integration of cloud stacks into EGI FedCloud follows a well-defined path, with infrastructure services such as accounting, monitoring, authentication and authorisation, etc. These configurations make your site discoverable and usable by the communities you wish to support, and allow EGI to support you in operational and technical matters.
-
Have a cluster of compute nodes available on which OpenStack will be installed.
-
Install the required additional tools and services for monitoring, authentication, accounting, networking, ...
-
-
Allocation policies:
-
Make decisions regarding which Virtual Organisations (VOs) you want to support. These decisions may be updated at any time.
-
Direct your existing and/or local user communities to setup a Virtual Organisation.
-
Infrastructure users: How to join as a user or community#
Allocation of resources (CPU/GPU hours, storage) is done at the Virtual Organisation (VO) Level. Users therefore must either be a member of an existing VO, or create one. Once a VO is supported at one or more sites, users can start using resources. While this is generally well described in the corresponding EGI Federated Cloud documentation. we give a general overview here. Also, the list of services and tools in section 5 will provide useful support to users at all levels.
The infrastructure provides multiple interfaces designed for different knowledge levels of users and for the different types of services. The cloud infrastructure provides access - you guessed it - to cloud resources. More specifically, these are OpenStack. resources, installed at multiple computer centres (distributed across Europe) called “sites”. One way to use these cloud instances is to use the OpenStack web interface (horizon) at every site. Since these are non-trivial to discover, the EOSC-Synergy dashboard. was developed for simpler discovery. The web interface offers access to all functionality of OpenStack, which includes computing, block- and object storage, and networking. Images available for running are provisioned via the AppDB. made available to the VO by a VO administrator.
More user-friendly access (everybody who knows the horizon web-interface knows there is room for improvement) is available via the Infrastructure Manager Dashboard. After initial configuration (documentation is available, including youtube videos. readthedocs. and moodle. users can easily deploy pre-configured VM infrastructures, including dynamic SLURM, hadoop or kubernetes clusters.
This may be understood as a starting point for the exploration of the infrastructure. All of it is available in a more technical way for automation via the commandline fedcloudclient. and REST interfaces, described in Section 5.
Persistent storage is available via the EGI Datahub. which allows multiple useful access patterns, including mounted filesystems and object storage.
This versatile cloud infrastructure may be used for a variety of use-cases. The spectrum of favourable patterns includes medium scale HPC (including GPU usage), on one end of the spectrum, via Portals that serve results of queries to large databases and conduct HPC analyses on request, all the way to traditional server hosting on the other end.
For any service in this environment, users will authenticate via EGI Check-In, to which they are redirected automatically. EGI Check-In offers to either authenticate via your home-organisation (e.g. the university you work at), or via a “Community-AAI” such as eduTEAMS, ORCID, GitHub, B2Access, Umbrella or Facebook. It is important to choose the correct one, because your VO membership information may come from the chosen community. What may be a bit confusing is that EGI is also a Community-AAI. To find your VO memberships, you need to choose different identities to log in (e.g. google at first, university later). To avoid confusion, it is important to remember the choices made.
Further features include features include:
-
Global accounting that aggregates and allows visualisation of usage information across the whole federation.
-
Monitoring of Availability and Reliability of the providers to ensure SLAs are met.
-
Since the opening of the EGI Federated Cloud, the following usage models have emerged:
-
Service hosting: the EGI Federated Cloud can be used to host any IT service as web servers, databases, etc. Cloud features, as elasticity, can help users to provide better performance and reliable services.
-
Examples:
-
-
Compute and data intensive applications: for those applications needing a considerable amount of resources in terms of computation and/or memory and/or intensive I/O. Ad-hoc computing environments can be created in the EGI cloud providers to satisfy extremely intensive HW resource requirements.
-
Examples:
-
Datasets repository: the EGI Cloud can be used to store and manage large datasets exploiting the large amount of disk storage available in the Federation.
-
Disposable and testing environments: environments for training or testing new developments.
-
Example:
-
All these tools may be used and combined to develop individual solutions that may be tailored perfectly for each use case.