We work on software that presents, searches, annotates and analyses text. Our goal is to create a common pipeline, enabling scholars and engineers to ingest, process and publish textual data – be it XML, raw text, or scans – in a common distributed and modular infrastructure adaptable to the different scholarly domains at the HuC and outside.
Text analysis is a large part of the responsibilities of our team. The focus lies on tools for linguistic, syntactic and semantic analysis, NER, as well as other information extraction algorithms. The total suite of tooling should automatically generate a context of metadata and annotations around a text, and enable users to confirm, reject or correct these annotations. The work of the team in this field interacts closely with more experimental development in R&D, at the DHLab and various research groups in computational science and computational linguistics. Prototypes from these groups can be adopted by the team if they can be improved to a certain level of maturity.
We are also responsible for packaging products and product components into interactive environments that are optimised for the specific needs of researchers or research projects.
- Gijsjan Brouwer
- Hennie Brugman (Team Lead) (Publications)
- Lars Buitinck (Publications)
- Hayco de Jong
- Bas Leenknegt
We work on products in the following product groups:
- Generic but flexible front end solutions: we try to exploit shared functionality between user interfaces while maintaining the flexibility to meet special project requirements.
- Tooling for several traditional and more innovative ways to annotate, and exploit these annotations.
- Searching in large text collections in ways that list/summarize/organize/visualize large amounts of hits.
- A state of the art text repository that supports the full life cycle of text documents in all their forms and versions, and in relation to all associated annotations and enrichments.
- Pipelines and tools for analysing, enriching and processing text documents.
- CLARIAH Plus, creates a national digital infrastructure for the humanities.
- EviDENce, studies how eyewitnesses have reported on violence, and how this may have changed over time.
- HiTimeP, is an annotation interface for entity linking.
- REPUBLIC, will provide access to the resolutions of the States General (1576-1796): more than half a million pages with handwritten and printed political information.