Publications

Johannes Kiesel, Fabienne Hubricht, Benno Stein, and Martin Potthast. A Dataset for Content Error Detection in Web Archives. In Maria Bonn, Stephen J. Downie, Alain Martaus, and Dan Wu (Eds.), 18th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2019), 349--350, ACM, June 2019. [PUMA: Archives Content Dataset Detection Error Web Zno]

Johannes Kiesel, Florian Kneist, Lars Meyer, Kristof Komlossy, Benno Stein, and Martin Potthast. Web Page Segmentation Revisited: Evaluation Framework and Dataset. In Mathieu d'Aquin, Stefan Dietze, Claudia Hauff, Edward Curry, and Philippe Cudré-Mauroux (Eds.), 29th ACM International Conference on Information and Knowledge Management (CIKM 2020), 3047--3054, ACM, October 2020. [PUMA: Dataset Evaluation Framework Page Revisited Segmentation Web Zno]

Michael Völske, Janek Bevendorff, Johannes Kiesel, Benno Stein, Maik Fröbe, Matthias Hagen, and Martin Potthast. Web Archive Analytics: Infrastructure & Applications @ Webis (extended abstract). In Andreas Wagner, Christian Guetl, Michael Granitzer, and Stefan Voigt (Eds.), 2nd International Symposium on Open Search Technology (OSSYM 2020), International Open Search Symposium, October 2020. [PUMA: Analytics Applications Archive Infrastructure Web Webis Zno]

Shahbaz Syed, Wei-Fan Chen, Matthias Hagen, Benno Stein, Henning Wachsmuth, and Martin Potthast. Task Proposal: Abstractive Snippet Generation for Web Pages. 13th International Conference on Natural Language Generation (INLG 2020), 237--241, Association for Computational Linguistics, December 2020. [PUMA: Abstractive Generation Pages Proposal Snippet Task Web Zno] URL

Johannes Frey, Marvin Hofer, and Sebastian Hellmann. Studying Linked Data Accessibility Healthiness for the Long Tail of the Data Web.. QuWeDa/MEPDaW@ ISWC, 55--64, 2023. [PUMA: Accessibility Data Healthiness Linked Long Tail Web Xack]

Niklas Deckers, and Martin Potthast. WARC-DL: Scalable Web Archive Processing for Deep Learning. 2022. [PUMA: Archive Deep Learning Processing Scalable WARC-DL Web Xack] URL

Sheikh Mastura Farzana, Maik Fröbe, Michael Granitzer, Gijs Hendriksen, Djoerd Hiemstra, Martin Potthast, Arjen P. de Vries, and Saber Zerhoudi. Report on the 1st International Workshop on Open Web Search (WOWS 2024) at ECIR 2024. SIGIR Forum, (58)1:1–13, Association for Computing Machinery, New York, NY, USA, Aug 7, 2024. [PUMA: 1st 2024 ECIR International Open Report Search WOWS Web Workshop imported yaff] URL

Dörthe Arndt, Jos De Roo, Patrick Hochstenbach, Rebekka Martens, Femke Ongenae, and Mathijs van Noort. RDF Surfaces as a First-Order Language for the Semantic Web. In Sabrina Kirrane, Mantas Simkus, Ahmet Soylu, and Dumitru Roman (Eds.), Rules and Reasoning, 200--216, Springer Nature Switzerland, Cham, 2024. [PUMA: Language RDF_Surfaces First-Order Semantic Web imported nopdf]

Patryk Burek, Nico Scherf, and Heinrich Herre. Ontology patterns for the representation of quality changes of cells in time. J. Biomed. Semantics, (10)1:16, Springer Science and Business Media LLC, October 2019. [PUMA: Cell Design Ontology; Web language ontology patterns; tracking;]

Janek Bevendorff, Matti Wiegmann, Martin Potthast, and Benno Stein. Product Spam on YouTube: A Case Study. Proceedings of the 2024 Conference on Human Information Interaction and Retrieval, 358–363, Association for Computing Machinery, New York, NY, USA, 2024. [PUMA: topic_language Content Quality, SEO, Spam, Web YouTube] URL