Finno-Ugric Data Sharing Space

Federating Open Knowledge About Contemporary and Historic Cultural Practices in the Wikibase System


Date
Mar 6, 2025 10:30 AM — 11:00 AM
Location
Estonian National Museum
Muuseumi tee 2, Tartu, 60532
We started experimenting with the legal, organisational, semantic and technical challenges of creating a genuinely trustworthy, AI-supported data-sharing space that can find and connect tangible and intangible elements of the Finno-Ugric cultural universes. We were also seeking a better governance model for oversight for the custodians of these endangered, shrinking universes in their language and with little technical knowledge, partly as alternatives to the established Wikipedia to the open knowledge incubation method for small linguistic minorities.

Meet up with some of our team members, Daniel Antal, Britt-Kathleen Mere, Ieva Pigoze, Bogáta Tímár, and Ieva Vīvere during the conference, and do not forget to check out our presentation at the LP05 Session, Digital Insights in Cultural Research, on Mar 6, 2025 10:30 AM — 11:00 AM, or meet up with Kata Gábor and Dániel Antal in Paris, Anna Mester and Mihály Nagy in Budapest in subsequent workshops.

Read more or take a look at our conference poster below:

Download the poster in [pdf](/documents/poster/dreams-reprex-poster-2025.pdf)
Download the poster in pdf

See our presentation made in the DNBH 2025 conference in Tartu:

What are the competencies of the system that we are building?

We are designing a data sharing space that can confidently work with the metadata schemas, ontologies of all GLAM institutions (public libraries and archive, museums) as well as private services like Bandcamp, YouTube, Flickr, Spotify, name registration services, or radio playlisting services. Applying the European Interoperability Framework extended to privately-held data, we ensure that our users have an accurate 360° view of the digital heritage in their custody or interest, and we guarantee that research or streaming services properly use the elements of these cultural universes.

Q1 Which recordings of contemporary musical work and traditional music were released in the Liv and Samoyedic languages in 2012?

A data (sharing) space is a system that integrates data whenever needed or permitted. The Statistical Data and Metadata eXchange, European Open Science Cloud, Europeana, the European Collaborative Cloud for Cultural Heritage (ECCCH) will be connected with private systems like Wikidata, Wikimedia Commons, the Spotify or YouTube API. Semantic interoperability means that our system understands public and private cultural APIs.

Q2 How can we find new knowledge about the historical or contemporary Khanti-Manysi music tradition?

The organisational interoperability is necessary to create systems that can support application from diverse collections and organisations. We must understand that archivists, rights managers, librarians, museologists, NGOs, private collectors, festival organisers often work with the same data but in a different job or workflow. We do not only need to understand which things belong to the Khanti-Manysi universe, but also how a librarian or a streaming service would handle that part of the cultural universe.

We are making available private collections on [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Road_Sign_Livonian_Coast_in_Latvian_and_Livonian.jpg), archive.org, and other open knowledge platforms.
We are making available private collections on Wikimedia Commons, archive.org, and other open knowledge platforms.

Q3 Which songs refer to dreams in their title or lyrics, regardless of language?

Starting with a playlist dataset of songs on Spotify, we enrich it with metadata in a way that we can serve first language-independent, then language-specific inferences and queries. In our conference presentation we will explain this by helping a curator who in our 14 language playlist wants to find songs about „imagining events while sleeping” or about a „dream”. Starting with a playlist dataset of songs on Spotify, we enrich it with metadata in a way that we can serve first language-independent, then language-specific inferences and queries. In our conference presentation we will explain this by helping a curator who in our 14 language playlist wants to find songs about „imagining events while sleeping” or about a „dream”. (Check out our dreams playlist on Spotify.)

Q4 What should be the linguistically correct description of a curated list of Samoyedic musical pieces in the Liv language?

Our system generates semantic statements from trustworthy metadata. The semantic statements are then translated to natural language descriptions. We aim for a level of fluency that is suitable for users within the Finno-Ugric communities. This tool can improve the curation and governance of open knowledge projects that cannot recruit many reviewers with a high level of language competency to give an effective oversight for a knowledge base or an AI application.

Daniel Antal
Daniel Antal
Co-founder