Inproceedings,

The information retrieval experiment platform (extended abstract)

, , , , , , , , and .
Proceedings of the Thirty-ThirdInternational Joint Conference on Artificial Intelligence, California, International Joint Conferences on Artificial Intelligence Organization, (August 2024)

Abstract

We have built TIREx, the information retrieval experiment platform, to promote standardized, reproducible, scalable, and blinded retrieval experiments. Standardization is achieved through integration with PyTerrier's interfaces and compatibility with ir\_datasets and ir\_measures. Reproducibility and scalability are based on the underlying TIRA framework, which runs dockerized software in a cloud-native execution environment. Using Docker images of 50 standard retrieval approaches, we evaluated all of them on 32 tasks (i.e., 1,600 runs) in less than a week on a midsize cluster (1,620 CPU cores and 24 GPUs), demonstrating multi-task scalability. Importantly, TIRA also enables blind evaluation of AI experiments, as the test data can be hidden from public access and the tested approaches run in a sandbox that prevents data leaks. Keeping the test data hidden from public access ensures that it cannot be used by third parties for LLM training, preventing future training-test leaks.

Tags

Users

  • @scadsfct

Comments and Reviews