Article,

How complex is your classification problem?

A. Lorena, L. Garcia, J. Lehmann, M. Souto, and T. Ho.
ACM Comput. Surv., 52 (5): 1--34 (September 2020)

Abstract

Characteristics extracted from the training datasets of classification problems have proven to be effective predictors in a number of meta-analyses. Among them, measures of classification complexity can be used to estimate the difficulty in separating the data points into their expected classes. Descriptors of the spatial distribution of the data and estimates of the shape and size of the decision boundary are among the known measures for this characterization. This information can support the formulation of new data-driven pre-processing and pattern recognition techniques, which can in turn be focused on challenges highlighted by such characteristics of the problems. This article surveys and analyzes measures that can be extracted from the training datasets to characterize the complexity of the respective classification problems. Their use in recent literature is also reviewed and discussed, allowing to prospect opportunities for future work in the area. Finally, descriptions are given on an R package named Extended Complexity Library (ECoL) that implements a set of complexity measures and is made publicly available.

BibTeX key: Lorena2020-rm
entry type: article
year: 2020
month: sep
journal: ACM Comput. Surv.
number: 5
pages: 1--34
publisher: Association for Computing Machinery (ACM)
volume: 52
copyright: http://www.acm.org/publications/policies/copyright\_policy\#Background
language: en

PUMA

How complex is your classification problem?

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on