Confusing statistical fiction with reality
26 Aug, 2025 at 11:24 | Posted in Statistics & Econometrics | 3 Comments
With the above cautions in mind, we may view each statistical analysis as a thought experiment in a fictional “small world” or “toy example” sharply restricted by its simplifying assumptions. The questions that motivated the study must be translated properly into this fictional world; statistical methods then answer the questions via mathematical deductions from the assumptions. But those answers apply with logical force only within a fictional world in which all the assumptions hold. Thus, when key assumptions have not been perfectly enforced by design or circumstance, a crucial task is to judge how well those answers correspond to reality …
The answers obtained from the model [the entire set of assumptions] may still provide useful insights or hints about the reality targeted by the study, even when the model assumptions are inaccurate. Nonetheless, it is a common fallacy to treat model-based conclusions as if they apply directly to reality. Doing so is an example of confusing statistical fiction with reality, sometimes called model reification, and the usual result is severely overconfident inferences about that reality.
Most work in econometrics and regression analysis still rests on the assumption that the researcher possesses a ‘true’ theoretical model. Based on this belief, econometricians proceed as if the only problems left to solve concern measurement and observation.
But when something sounds too good to be true, it usually isn’t — and econometric fantasies are no exception. The snag is simple: there is little to support the assumption of perfect specification. Across economics and the social sciences, not a single regression or econometric model meets the standards of the supposedly ‘true’ theoretical model. Nor is there much reason to expect that will change.
The idea that we could ever construct a model that includes all relevant variables and correctly specifies the functional relationships between them is not merely unsupported — it is impossible to support. The theories that guide econometric model-building are inadequate; variables are always missing, and functional forms are always guessed.
Every regression model is, in fact, misspecified. With an endless number of possible variables to include and equally endless ways to specify relationships among them, each econometrician ends up with his own specification and his own parameter estimates. The much-vaunted Holy Grail of consistent, stable parameter values remains nothing more than a dream.
In order to draw inferences from data as described by econometric texts, it is necessary to make whimsical assumptions. The professional audience consequently and properly withholds belief until an inference is shown to be adequately insensitive to the choice of assumptions. The haphazard way we individually and collectively study the fragility of inferences leaves most of us unconvinced that any inference is believable. If we are to make effective use of our scarce data resource, it is therefore important that we study fragility in a much more systematic way. If it turns out that almost all inferences from economic data are fragile, I suppose we shall have to revert to our old methods …
A rigorous application of econometric methods presupposes that real-world economic phenomena are governed by stable causal relations between variables. Parameter values estimated in one spatio-temporal context are assumed to be exportable to entirely different contexts. But for this assumption to hold, one would need to demonstrate convincingly that the underlying causal mechanisms are genuinely stable and invariant, retaining their parametric status across settings. The persistent failure of econometrics to deliver reliable predictions suggests otherwise: the quest for fixed parameters rests on little more than hope — and hope alone cannot bear the weight of science.