Issue #2 2024

STOCKHOLM INTELLECTUAL PROPERTY LAW REVIEW

Volume 7, Issue 2

Frantzeska Papadopoulou Skarp
Eric Luth
Lisa Gemmel

Editorial

2024 was an important year both for legal and for technical developments relating to artificial intelligence (AI) and text and data mining (TDM). Landmark cases, such as the LAION ruling by the Hamburg court, provided early judicial interpretations of national TDM provisions post-implementation of the Copyright in the Digital Single Market (CDSM) Directive, while further litigation has recently also unfolded in the United States and the United Kingdom.

Against this backdrop, the Institute for Intellectual Property and Market Law (IFIM) at Stockholm University hosted a conference on TDM, AI, and Libraries in collaboration with Wikimedia Sverige and Swedish Library Association. The event was opened by the Dean of the Law Faculty, Professor Jane Reichel and closed by the National Librarian Karin Grönvall and Stockholm University Library’s Head Librarian Wilhelm Widmark. It brought together legal scholars, researchers, and librarians all eager to examine the evolving legal framework surrounding TDM, AI-driven research, and its impact on knowledge dissemination.

A major theme of the conference was the complex interplay between copyright and AI-related research. While copyright may serve as a foundation for intellectual creation, it also presents a number of uncertainties and potential obstacles for researchers and libraries, particularly when it comes to digitization and access to materials to be used in TDM. Libraries hosting digitized materials are restricted by national copyright legislation when it comes to accessibility provided to researchers.

The research exceptions to copyright are, in turn, hard to navigate and rarely interpreted in the national courts. The conference provided an important and rather unique platform to discuss if and how copyright needs to be amended to allow libraries to fulfil their role in supporting research and researchers. Building on the very interesting debates in the conference, this issue of the Stockholm IP Law Review explores the legal, ethical, and practical challenges of TDM in the age of AI. The contributions examine TDM implementations across jurisdictions, the role of open-access knowledge, and the implications of AI for copyright law. Maja Bogataj Jančič and Ema Purkart analyze Slovenia’s approach to TDM, highlighting both the progressive steps taken and the lingering legal uncertainties, while Konrad Gliściński provides insights into Poland’s imple- mentation, where conflicting interpretations have raised concerns over its compatibility with EU law. Branka Marušić explores how different EU Member States have navigated the harmonization of TDM exceptions, questioning whether the legal framework fosters cohesion or divergence across Europe.

Beyond legislative analysis, this issue also considers how AI is reshaping legal research itself. Professor Johan Lindholm examines the growing role of computational methods in legal scholarship, highlighting how natural language processing (NLP) and large-scale data analysis can transform traditional legal research methodologies. His work challenges the perception that doctrinal and empirical approaches are incompatible, arguing instead that data-driven legal analysis can provide deeper insights into legal texts, case law, and legal decision-making patterns. In a similar vein, Holli Sargeant and Professor Felix Steffek introduce a dataset of UK Court decisions, the Cambridge Law Corpus, and explore how AI models can predict outcomes in the UK Employment Tribunal, offering a glimpse into the future of computational legal analysis. These contributions reflect how AI is not only reshaping how legal professionals access and interpret the law but also redefining the nature of legal scholarship. The role of open-access knowledge in AI training is another topic addressed. Eric Luth scrutinizes the use of Wikipedia and the other Wikimedia platforms as a source for AI training data, highlighting potential tensions between open-access licensing and proprietary and
commercial AI development while arguing for the value and importance of open-access material in AI training. Relatedly, Ana Lazarova and Eric Luth examine the position of knowledge custodians—libraries, archives, and cultural heritage institutions—as enablers or gatekeepers in the AI era, exploring the legal and practical dilemmas they face in managing access to digital resources. A core focus of their discussion is the opt-out mechanism of the CDSM Directive’s TDM exception, assessing whether current legal structures empower rightsholders to control AI’s use of copyrighted material—or rather introduce further legal uncertainty that could hinder research and innovation.

As AI-driven research accelerates, so does the urgency of ensuring that copyright law evolves to support and not stifle scientific inquiry. The contributions in this issue reflect the ongoing legal debates surrounding TDM, copyright, and AI, offering perspectives on how the law can better accommodate technological progress, respect the rights and interests of copyright holders, while safeguarding the ecosystem of free and open knowledge and its production.

Frantzeska Papadopoulou Skarp
Eric Luth
Lisa Gemmel

Text and Data Mining in the Slovenian Legal System

By Maja Bogataj Jančič and Ema Purkart

ABSTRACT

The Slovenian implementation of the text and data mining exceptions in Articles 57a in 57b of the Copyright Act provides both very progressive elements of the European TDM exceptions implementation and also problematic ones. The TDM exceptions allow the digitization of analogue works for the purpose of TDM as well as the remote access to content and, in the case of the TDM exception for scientific research, also the sharing of the results for TDM purposes, which is a very progressive implementation worth repeating elsewhere. Rights holders also need to ensure that the beneficiaries of both exceptions can effectively perform TDM and need to act within 72 hours or face sanctions. Consequently, the Slovenian legal order represents a favorable legal basis for building models of generative artificial intelligence. The problematic aspect of the Slovenian implementation is that it does not explicitly consider access to the content freely available online as lawful access, as is otherwise explicitly stated in the Recital 14 of the DSM Directive. In this regard, artificial intelligence builders in Slovenia can be significantly worse off, and it is reasonable to expect that the legislators will correct this error in the future. Despite this obstacle researchers who build open-access LLMs for Slovenian or other languages have a good legal basis for collecting texts and building datasets, sharing them with others, and building LLMs on the basis of the Slovenian exception.

Polish Implementation of TDM Exceptions
– General Characteristics

By Konrad Gliściński

ABSTRACT

The aim of this article is to analyse the implementation of Directive (EU) 2019/790 on copyright and related rights in the context of Text and Data Mining exceptions within Polish law. It highlights interpretative challenges and uncertainties arising from the regulations, potentially leading to legal disputes. The article begins with an overview of the Directive and then examines the specific provisions in Polish law that implement it, focusing on the general and research exceptions. It discusses the lack of clarity in definitions, the scope of exceptions, and the implications for potential beneficiaries. Additionally, it identifies uncertainties regarding the storage of copies, access conditions, and protections against technical measures. Ultimately, the article concludes with a summary of the main challenges presented by the implementation and their potential impact on the practical use of Text and Data Mining exceptions.

TDM Exception or Limitation – Methodology of Implementation in the EU Member States: Creating Cohesion or Diversion?

By Branka Marušić

ABSTRACT

This article examines the margin of appreciation of the EU Member States on the choice and formulation of the E&Ls when implementing them into their national law. It does so, firstly by explaining the methods and terminology used to assess implementation of directives. It then continues with the cartography of E&Ls prior to and after the enactment of the DSM Directive in the research sector. Finally, this article concludes with remarks on the future viability of the TDM
exception.

Textual Insights: What Can Computers Teach Legal Scholars About Law?

By Johan Lindholm

ABSTRACT

Legal research has historically relied on the manual and systematic study of authoritative texts, a methodology that has remained largely unchanged despite technological advancements. However, recent developments in natural language processing and other data-driven approaches present new opportunities for legal scholars. This essay examines whether and how these computational tools can complement doctrinal approaches and explores the potential of computational methods to enhance and transform legal scholarship. In emphasizing the compatibility of computational and doctrinal approaches, it argues that by integrating these approaches, legal scholars can make scientific discoveries beyond the scope of either method alone. The essay concludes by outlining the steps necessary for legal scholarship to fully embrace and benefit from these emerging technologies.

Researching Legal AI: The Cambridge Law Corpus and Predicting Decisions of the UK Employment Tribunal

By Holli Sargeant and
Felix Steffek

ABSTRACT

This contribution introduces the Cambridge Law Corpus (CLC) and a research project benchmarking the prediction of UK Employment Tribunal decisions, which is based on the CLC data. The CLC is a dataset containing more than 320,000 UK court decisions. This article explains the need for legal datasets, the creation of the CLC and the ethical considerations concerning the dataset’s construction and distribution. Subsequently, an experiment engaging with legal judgment prediction using the dataset is reported. The decisions predicted are those of the UK Employment Tribunal, which is the first instance for conflicts between employees and their employers. The experiment compares baselines of different AI models and human experts predicting whether the employee will win, partly win, lose or whether the Tribunal will render another decision.

The Use of Wikipedia, Wikimedia, and Open Access Content for Artificial Intelligence and Text and Data Mining

By Eric Luth

ABSTRACT

The role of Wikimedia platforms and the broader Digital Commons in developing artificial intelligence (AI) models remains significant yet underexplored. Wikimedia content, licensed under Creative Commons (CC) licenses, constitutes a primary source of training data for many large language models (LLMs), with implications for both the sustainability of the Digital Commons and compliance with copyright law. This article examines the compatibility of CC licenses with AI training, particularly under the European Union’s Copyright Directive on the Digital Single Market (CDSM Directive), which introduced new exceptions for text and data mining (TDM). It identifies scenarios where CC-licensed content can be legally used for AI training and discusses unresolved questions about reproduction, derivation, adaptation, attribution, and share-alike requirements under these licenses. The analysis highlights how stakeholders within the Digital Commons—Wikimedia, GLAM institutions, educational organizations, and intergovernmental organizations (IGOs)—influence the quality and ethical use of AI models. It also examines risks posed by AI usage, such as reduced visibility of source platforms, a decline in volunteer contributions, and diminished sustainability of open knowledge ecosystems. Strategies to uphold the Digital Commons include enforcing share-alike obligations, fostering collaboration among stakeholders, and engaging with AI developers to ensure compliance with CC licenses. The findings underscore the dual potential of open access to enhance AI model quality while maintaining the integrity of digital commons ecosystems. Digital Commons stakeholders must be open in a way that promotes qualitative AI development while maintaining sustainable open knowledge dissemination.

To Mine or Not to Mine: Knowledge Custodians Managing Access to
Information in the
Age of AI

By Ana Lazarova and
Eric Luth

ABSTRACT

The article addresses the legal challenges surrounding the computationally-driven reuse of digital cultural heritage collections for the purpose of training large AI models. It examines the role of knowledge custodians, such as public sector actors like cultural heritage institutions, but also non-governmental commons-based projects such as Wikimedia Commons and Flickr Commons and intergovernmental organisations such as UN agencies, in managing access to these materials. Focusing on the EU’s text and data mining (TDM) regime, this contribution considers the impact of copyright and related rights on AI training. It further highlights the complexities faced by knowledge custodians in navigating access rights and copyright management, particularly in exercising rightsholder reservations under Article 4 of Directive (EU) 2019/790, with respect both to content that remains under copyright and such that has entered the public domain.

About us

The Stockholm IP Law Review is the first Open Access IP Law Review in Europe. We are furthermore the only IP journal with an active post-graduate student involvement.

About the education

Master of Laws (LL.M) in European Intellectual Property Law

Social medias