Projects in LegalTech

Odyssey - Opening the National Archive's legal data to AI for A2J (May 2024 - Feb 2026)

This project has received an Innovate UK grant under the Professional & Financial Services Data Access Demonstrators: ESG" funding competition. The project Odyssey will enrich The National Archives datasets, which encode UK primary legislation and jurisprudence in LegalDocML format. The enriched datasets will then be used to fine-tune Large Language Models for litigants in person LegalTech applications. The project consortium includes Tabled Technologies Limited, Legal-Pythia LLP, The National Archives, Swansea University, Keele University, and Northumbria University.

Automating regulatory compliance in the upstream oil and gas sector of Ghana (Sep 2022 - Aug 2025)

This project is a PhD project carried out by a former master student of the LegalTech LLM at Swansea University. The PhD is supported by the Ghana Scholarship Secretariat. I helped the student preparing the application for his scholarship and I am now supervising his PhD activity. The PhD aims at developing a computational ontology in OWL as well as shapes and rules in SHACL for automating compliance checking of regulations in the upstream oil and gas sector of Ghana, in particular those related to Local Content and Local Participation Regulations (L.I 2204).

COST action DKG (Sep 2020 - Sep 2024)

I am in the management committee of the COST action CA19134 - Distributed Knowledge Graphs (DKG). The COST action aims at creating a research community for deployable Distributed Knowledge Graph technologies that are standards-based, and open, embrace the FAIR principles, allow for access control and privacy protection, and enable the decentralised publishing of high quality data. The aims of the action are aligned with my latest research results in compliance checking on RDF knowledge graphs, e.g., [Robaldo et al, 2023a], [Robaldo et al, 2023b], [Esposito et al, 2023].

LegalTech for Cybersecurity (Aug 2022 - Aug 2023)

This project refers to a Morgan Advanced Studies Institute fellowship and a UKRI Challenge Fund grant that I have obtained to support networking activities with Old Dominion University (Virginia, US), with the aim of researching novel methods to apply LegalTech technologies for cybersecurity, in particular in the maritime trade domain. The research activity aims at preparing further interdisciplinary research project proposals in the field. I am the principal investigator of the project, which also involves other professors in IT, Law, and Economics from the two universities.

LAST-JD RIoE (Nov 2019 - Oct 2023)

LAST-JD RIoE (LAST-JD - Rights of the Internet of Everything) is supported by a Marie Skłodowska-Curie Innovative Training Networks . It has been retained for funding with an overall score of 94.8%. My involved in LAST-JD started in 2014 at the University of Turin. From 2015 until 2020, I was responsible of LAST-JD for the University of Luxembourg. I followed the administrative activities of the program, e.g. participating to the doctoral boards, giving classes to introduce NLP for the legal domain to the students, following and advising them in their research activity, etc.

DAPRECO (Feb 2017 - Jun 2019)

DAPRECO (DAta Protection REgulation COmpliance) was a CORE project that has been retained for funding on Nov 2016. DAPRECO is the single senior CORE project proposed by SnT in 2016 that has been retained for funding, out of 18 submitted ones. The project was designed as a use case of my past project ProLeMAS and it led to the DAPRECO knowledge base [Robaldo et al., 2020], which represents norms from the General Data Protection Regulation in reified I/O logic [Robaldo and Sun, 2017] by using the concepts from the Privacy Ontology (PrOnto) [Palmirani et al., 2018a].

MIREL (Jan 2016 - Dec 2019)

MIREL (MIning and REasoning on Legal texts) is supported by a Marie Skłodowska-Curie Research and Innovation Staff Exchange project . It has been retained for funding with an overall score of 97.2%. I coordinated the writing of MIREL and I managed its activities for all over the duration of the project, as certified in this statement. MIREL promoted mobility and staff exchange among the 16 international partners, to create an inter-disciplinary consortium in Law and Artificial Intelligence areas including NLP, Computational Ontologies, Argumentation, Logic&Reasoning, and Business Process Management.

ProLeMAS (Jun 2015 - May 2017)

ProLeMAS (PROcessing LEgal language in normative Multi-Agent Systems) was supported by a Marie Skłodowska-Curie Individual fellowship . It has been retained for funding with an overall score of 96.4%. ProLeMAS aimed at (1) using reification to fill the gap between the current formalizations in deontic logics and the richness of natural language semantics and (2) implementing tools for (semi-)automatically building machine-readable representations from legal texts via NLP. In the context of ProLeMAS, I visited for six months the private company APIS Hristovich EOOD.

EUCases (Oct 2013 - Oct 2015)

EUCases (Linking Legal Open Data in Europe) was a collaborative Research Project supported by the Seventh Framework Programme (FP7) funding and involving five partners, among which Nomotika S.R.L. and APIS Hristovich EOOD. The project EUCases was my first relevant experience in LegalTech. I was responsible of the NLP tasks for the Italian language, as certified in this statement. The project ProLeMAS was built on EUCases results. Nomotika S.R.L. and APIS Hristovich EOOD, were subsequently involved in the projects ProLeMAS, MIREL, and LAST-JD RoIE.

ICT4LAW (Mar 2009 - Feb 2012)

ICT4LAW was a large interdisciplinary research project involving twelve partners, six academics and six industrials. The goal was to create novel services for citizens, enterprises, public administration and policy makers. My role in the project was minimal, but it was useful to learn general expertise in LegalTech. I developed rule-bases systems for recognizing modificatory provisions and I carried out dependency parsing to feed statistical classifiers. The ICT4LAW project led to the creation of the spin-off Nomotika S.R.L., founded by the University of Turin and Augeos S.P.A..

Past research

NL Quantifiers

In the context of my PhD thesis, I worked on formal representation of NL quantifiers. In particular, I devised a new logical framework to properly represent Scopeless readings, such as cumulative and collective readings. I authored six journal publications and several conference/workshop ones on the topic and I am the single author of four of these journal publications. My last publication on the topic was (Robaldo, Szymanik, Meijering, 2014), which are coauthored with Jakub Szymanik and Ben Meijering. After that publication, I stopped working on NL Quantifiers. On the other hand, Jakub was later awarded by an ERC Starting Grant on related topics (developing cognitive semantics of generalized quantifiers). Congrats Jakub!

Penn Discourse Treebank (PDTB)

The PDTB is a corpus developed at the University of Pennsylvania (UPenn). The PDTB is, to date, the largest annotation effort at the discourse level, providing annotations of the argument structure, attribution and semantics of discourse connectives. After my PhD thesis, I visited University of Pennsylvania for five months (and later again in 2009 for two months) where I started working with the PDTB research group. I contributed to the writing of the PDTB 2.0 annotation manual and the sense annotation in the release 2.0 of the corpus. During that period, I also started working with reification-based semantics, specifically the approach of Jerry R. Hobbs, and I used it to model concessive relations found in the PDTB.

Sentiment Analysis

In 2013, I defined, together with Luigi Di Caro, an XML formalism called OpinionMining-ML for tagging users' opinions on products and services, and I built a corpus of 1000 comments about restaurants taken from www.2spaghi.it, one of the biggest web2.0 sites about Italian restaurants and pizzerias. Afterwards, I won the Working Capital Accelerator 2014, a Telecom Italia grant to support new startups and innovative research projects, with the project SentiTagger, aiming at automatically tagging comments in OpinionMining-ML. The selection was highly competitive: only 40 projects out of about 1,300 submitted ones were selected. Each selected project was granted 25,000 euros from Telecom Italia.

Gamification

In 2013, I worked on Gamification-based approaches to corpora building, pionereed by Massimo Poesio. I was specifically involved in the Phrase Detective game-with-a-purpose, aiming at creating anaphorically annotated resources through Web cooperation. I was an expert annotator of the game and I developed a converter from Italian texts to the input format of the game via dependency parsing, in order to allow annotations in Italian. Massimo Poesio was later awarded with an ERC Advanced grant, on the project ''DALI - Disagreements and Language Interpretation'', which proposes more advanced games, drawn from Phrase Detective, to collect massive amounts of data about anaphora from people playing them. Congrats Massimo!