Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective
R. Knauer, and E. Rodner. KI 2023: Advances in Artificial Intelligence: 46th German Conference on AI, Berlin, Germany, September 26--29, 2023, Proceedings, page 114--129. Berlin, Heidelberg, Springer-Verlag, (2023)
Abstract
A key challenge in machine learning is to design interpretable
models that can reduce their inputs to the best subset for
making transparent predictions, especially in the clinical
domain. In this work, we propose a certifiably optimal feature
selection procedure for logistic regression from a mixed-integer
conic optimization perspective that can take an auxiliary cost
to obtain features into account. Based on an extensive review of
the literature, we carefully create a synthetic dataset
generator for clinical prognostic model research. This allows us
to systematically evaluate different heuristic and optimal
cardinality- and budget-constrained feature selection
procedures. The analysis shows key limitations of the methods
for the low-data regime and when confronted with label noise.
Our paper not only provides empirical recommendations for
suitable methods and dataset designs, but also paves the way for
future research in the area of meta-learning.
%0 Conference Paper
%1 Knauer2023-im
%A Knauer, Ricardo
%A Rodner, Erik
%B KI 2023: Advances in Artificial Intelligence: 46th German Conference on AI, Berlin, Germany, September 26--29, 2023, Proceedings
%C Berlin, Heidelberg
%D 2023
%I Springer-Verlag
%K Zno best conic cost-sensitive interpretable learning machine meta-learning mixed-integer optimization selection subset
%P 114--129
%T Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective
%X A key challenge in machine learning is to design interpretable
models that can reduce their inputs to the best subset for
making transparent predictions, especially in the clinical
domain. In this work, we propose a certifiably optimal feature
selection procedure for logistic regression from a mixed-integer
conic optimization perspective that can take an auxiliary cost
to obtain features into account. Based on an extensive review of
the literature, we carefully create a synthetic dataset
generator for clinical prognostic model research. This allows us
to systematically evaluate different heuristic and optimal
cardinality- and budget-constrained feature selection
procedures. The analysis shows key limitations of the methods
for the low-data regime and when confronted with label noise.
Our paper not only provides empirical recommendations for
suitable methods and dataset designs, but also paves the way for
future research in the area of meta-learning.
@inproceedings{Knauer2023-im,
abstract = {A key challenge in machine learning is to design interpretable
models that can reduce their inputs to the best subset for
making transparent predictions, especially in the clinical
domain. In this work, we propose a certifiably optimal feature
selection procedure for logistic regression from a mixed-integer
conic optimization perspective that can take an auxiliary cost
to obtain features into account. Based on an extensive review of
the literature, we carefully create a synthetic dataset
generator for clinical prognostic model research. This allows us
to systematically evaluate different heuristic and optimal
cardinality- and budget-constrained feature selection
procedures. The analysis shows key limitations of the methods
for the low-data regime and when confronted with label noise.
Our paper not only provides empirical recommendations for
suitable methods and dataset designs, but also paves the way for
future research in the area of meta-learning.},
added-at = {2025-01-07T13:17:46.000+0100},
address = {Berlin, Heidelberg},
author = {Knauer, Ricardo and Rodner, Erik},
biburl = {https://puma.scadsai.uni-leipzig.de/bibtex/24df2b2b5a46bbe20000e9dc9f92d9ad0/scadsfct},
booktitle = {{KI} 2023: Advances in Artificial Intelligence: 46th German Conference on {AI}, Berlin, Germany, September 26--29, 2023, Proceedings},
interhash = {88b4f129bad2e850d26d0763d2ef31bd},
intrahash = {4df2b2b5a46bbe20000e9dc9f92d9ad0},
keywords = {Zno best conic cost-sensitive interpretable learning machine meta-learning mixed-integer optimization selection subset},
location = {Berlin, Germany},
pages = {114--129},
publisher = {Springer-Verlag},
timestamp = {2025-02-04T10:56:14.000+0100},
title = {{Cost-Sensitive} Best Subset Selection for Logistic Regression: A {Mixed-Integer} Conic Optimization Perspective},
year = 2023
}