Article,

Scale Reliant Inference

, , , , , and .
(2022)

Abstract

This article develops new tools and new statistical theory for a statistical problem we call Scale Reliant Inference (SRI). Many scientific fields collect multivariate data that lack scale: where the size, sum, or total of each measurement is arbitrary and is not representative of the scale of the underlying system being measured. For example, in the analysis of high-throughput sequencing data, it is well known that the number of sequencing reads (the sequencing depth) varies substantially due non-biological (technical) factors. This article develops a formal problem statement for SRI which unifies problems seen in multiple scientific fields. Informally, we define SRI as an estimation problem in which an estimand of interest cannot be uniquely identified due to the lack of scale information in the observed data. This problem statement represents a reformulation of the related field of Compositional Data Analysis and allows us to prove fundamental limits on SRI. For example, we prove that inferential criteria such as consistency, calibration, and bias are unattainable for common SRI tasks. Moreover, we show that common methods often applied to SRI implicitly assume infinite knowledge of the system scale and can lead to a troubling phenomena termed unacknowledged bias. Counter-intuitively, we show that this problem worsens with more data and can lead to substantially elevated Type-I and Type-II error rates. Still, we show that rigorous statistical inference is possible so long as models acknowledge the fundamental uncertainty in the system scale. We introduce a class of models we call Scale Simulation Random Variables (SSRVs) as flexible, rigorous, and computationally efficient approach to SRI.

Tags

    Users

    • @scadsfct

    Comments and Reviews