Structured Data
Data comes in many shapes and sizes. We carefully transform data that are important for a project into format that tools can work with. Our goal is to use the right tool for every dataset and project. This ranges from ready-made software, via internally developed generic components and frameworks, to complex, project-specific workflows. Data is valuable and should have a clear path to the archive. We provide that route and the means to create the accompanying metadata.
Datasets are becoming strongly interconnected: internally, but increasingly also externally. These connections are established, for example, on the basis of shared key entities (people, places, commodities..) and core vocabularies. This is particularly the case in the field of Linked (Open) Data. With Timbuctoo, our own Linked Data Store, we offer researchers a stable basis for storing their data. Lenticular Lens makes it possible to create and validate links between entities in different Timbuctoo datasets. Within the Dutch CLARIAH infrastructure, we provide the "I" (Interoperability) in FAIR-datasets using vocabularies as the semantic cornerstone.
The origin (provenance) of data is an important point of attention in the (digital) humanities. This information gives researchers who want to reuse existing data insight into the sources and processing steps, which creates the basis for trust in a dataset. To support this we have created a provenance service that developers can easily integrate and that records these data traces. Our next step will be to provide custom insights on these data tracks for different target audiences.
Contact
Menzo Windhouwer, Lead developer for Team Structured Data (Research Gate, LinkedIn, Pure)
Related Research Projects
- Golden Agents: providing storage and accessibility for enriched linked data- and linksets
- CLARIAH(+): setting up the metadata harvesting pipeline, its vocabulary ecosystem and reimplementing/specifying data stories
- CLARIN: maintain the core of the Component Metadata infrastructure and its harvesting pipeline
- REPUBLIC & OpenHuygens: revive Huygens data sets in state-of-the-art technology and data stores
- ISEBEL: harvesting from & central catalog for folktale data
Software and Data
- Lenticular Lens is a tool which allows users to construct linksets between entities from different Timbuctoo datasets (so called data-alignment or reconciliation). Lenticular Lens tracks the configuration and the algorithms used in the alignment and is also able to report on manual corrections and the amount of manual validation done.
- Timbuctoo is a Linked Data Store, which is able to store large graphs. It provides a GraphQL API interface, which makes the schema of the graph easily available for interaction. ResourceSync support enables us to keep related indexes and access tools synchronized. Timbuctoo keeps close track of the provenance. It is the heart of an expanding toolset to deal with all aspects of linked data.
- Procrustus is our forms framework, which we can easily adapt to any data source and tweaked for maximum usability.
Publications and Presentations
- Idrissou, Al., van Wissen, Leon., Zamborlini, Veruska. The Lenticular Lens – Addressing Various Aspects of Entity Disambiguation in the Semantic Web. At Graphs and Networks in the Humanities 2022, online, 3-4 February 2022.
- Windhouwer, M., Kemps-Snijders, M., Trilsbeek, P., Moreira, A., van der Veen, B., Silva, G., von Rhein, D. FLAT: constructing a CLARIN compatible home for language resources. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA), Portorož, Slovenia, May 23 - 28, 2016.
- Zeeman, R., Windhouwer, M. Tweak Your CMDI Forms to the Max. At the CLARIN Annual Conference, Pisa, Italy, October 8-10, 2018. (video)
- Ding, Q., Meder, T., Windhouwer, M. ISEBEL an Intelligent Search Engine for Belief Legends. In Digital Humanities 2019: Conference Abstracts (DH 2019), Utrecht, The Netherlands, July 9-12, 2019