The portfolio of EU research projects dedicated to the Fight against Crime and Terrorism (FCT) predominantly focuses on applied research. These projects bring research and industry together with Law Enforcement Agencies (LEAs) and security practitioners to address pertinent security threats and improve societal resilience. Many proposed approaches and security solutions rely on large data sets for data-intense techniques such as Machine Learning (ML), Artificial Intelligence (AI), Big Data analytics and intelligent visualization.
A common and recurring challenge in the FCT landscape is the lack of domain-specific data with sufficient quality and quantity to enable appropriate training and testing of the methods, tools, and platforms developed. This forms part of a vicious cycle. The lack of data affects the quality of ML/AI models and other data-intensive tools; in turn, it is hard for security practitioners to benchmark and validate the solutions and how they would perform in an operational setting.
Researchers and practitioners have raised this disconnect and its impact on security research several times, including multiple FCT workshops and EU events. Unfortunately, these barriers to data sharing will ultimately affect progress and innovation in the FCT research community.
There is not a single root cause of the data issue; instead, it is the result of many interacting factors that have emerged and increased in complexity over time. Such factors include,
- unique privacy concerns for handling and processing FCT data;
- legal uncertainties that stem from applying data protection principles in practice, combined with interoperability and data harmonization challenges;
- security restrictions and classification of real-world FCT data owned by LEAs and other security practitioners such as border authorities;
- lack of a data-sharing culture combined with unclear policies on formal data-sharing procedures;
- challenges and lack of incentives for LEAs as data owners to pseudonymize and curate large datasets to make them available as research data; and
- the lack of trust amongst different organizations to share FCT-related data due to potentially high risks and impact on society
Therefore, the FCT research community needs a trusted and safe infrastructure to share and co-produce large, high-quality datasets that are sufficiently realistic and domain-specific to drive FCT research and innovation forward.
LAGO will address the data issue in the FCT research landscape by building an evidence-based and validated multi-actor reference architecture for a trusted EU FCT Research Data Ecosystem (RDE). The RDE will be an open, transparent and secure data infrastructure where FCT-related data can be co-created, made available and shared in a trusted environment.
The RDE will comply with European values and principles on data protection, privacy and ethics. It will support multiple data spaces where interested actors deposit, share and co-produce data and tools for FCT research purposes based on common rules, protocols, standards and tools in a trusted environment and accompanying governance framework. The reference architecture and governance framework will use design principles such as decentralization, data sovereignty, data quality, openness, transparency, and trust. The results of LAGO will be instrumental in identifying and removing barriers to data sharing in the FCT domain. And will also provide the structural, governance and technical foundations and the roadmap for the future implementation of the RDE.
The LAGO Principles
LAGO is built on ten core principles.
- Decentralization: FCT research data and datasets are not centralized, but created, provided, and made available by data providers to users in a federated environment.
- Data sovereignty: ownership and control of data are retained by providers and/or owners.
- Security and trust: the RDE maintains confidence in the identity and capability participants and provides the measures and tools to protect the integrity and security of data and operations on them.
- Data quality: FCT research requires high-quality datasets to train and test data-driven and AI/ML prototypes and solutions
- Openness: rules, specifications, and protocols for data sharing in the context of the RDE are open.
- Transparency: data sharing and exchange are transparent, tracked, traceable, and accountable.
- Proportionality and risk: the RDE ensures and provides capabilities for assessing the risk and proportionality between the lawfulness and possible interference with the fundamental rights in accessing and providing data and datasets.
- Interoperability: several systems or services, both providers and users, will be enabled to exchange and properly use harmonized (in format, structure, and semantics) research data within the RDE.
- Portability: data is described in a standardized protocol that enables transfer and processing to increase its usefulness as a strategic resource.
- Ethics, Legal and Privacy: FCT research data normally include personal and sensitive data; the RDE will consider and comply with legal and ethical rules of operation (including privacy and data protection) and fundamental rights as well as applicable legislations in EU Member States.