@scadsfct

CopyCat: Near-duplicates within and between the ClueWeb and the common crawl

, , , , , , and . Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, ACM, (July 2021)

Links and resources

Tags