Normalizing Flow Regression for Bayesian Inference with Offline Likelihood Evaluations

Chengkun Li1, Bobby Huggins2, Petrus Mikkola1, Luigi Acerbi1
1Department of Computer Science, University of Helsinki
2Department of Computer Science and Engineering, Washington University in St. Louis
7th Symposium on Advances in Approximate Bayesian Inference (AABI) - Proceedings track, 2025
Code Paper Social Markdown

TL;DR

We propose Normalizing Flow Regression (NFR), a novel offline inference method for approximating Bayesian posterior distributions using existing log-density evaluations. Unlike traditional surrogate approaches, NFR directly yields a tractable posterior approximation through regression on existing evaluations, without requiring additional sampling or inference steps. Our method performs well on both synthetic benchmarks and real-world applications from neuroscience and biology, offering a promising approach for Bayesian inference when standard methods are computationally prohibitive.

@article{li2025normalizing, title={Normalizing Flow Regression for Bayesian Inference with Offline Likelihood Evaluations}, author={Li, Chengkun and Huggins, Bobby and Mikkola, Petrus and Acerbi, Luigi}, journal={7th Symposium on Advances in Approximate Bayesian Inference (AABI) - Proceedings track}, year={2025} }

Introduction

Bayesian inference with computationally expensive likelihood evaluations remains a significant challenge in scientific domains. When model evaluations involve extensive numerical methods or simulations, standard Bayesian approaches like MCMC or variational inference become impractical due to their requirement for numerous density evaluations.

Practitioners often resort to simpler alternatives like maximum a posteriori (MAP) estimation, but these point estimates fail to capture parameter uncertainty. Recent surrogate modeling approaches can approximate expensive density functions but typically don't directly yield valid probability distributions, requiring additional sampling or inference steps.

Our Contribution

We propose Normalizing Flow Regression (NFR), a novel offline inference method that directly yields a tractable posterior approximation through regression on existing log-density evaluations. Unlike other surrogate methods, NFR directly produces a posterior distribution that is easy to evaluate and sample from.

NFR efficiently recycles existing log-density evaluations (e.g., from MAP optimizations) rather than requiring costly new evaluations from the target model. This makes it particularly valuable in settings where standard Bayesian methods are computationally prohibitive.

Background

Normalizing Flows

Normalizing flows construct flexible probability distributions by transforming a simple base distribution (typically a multivariate Gaussian) through an invertible transformation $T_{\boldsymbol{\phi}}: \mathbb{R}^{D} \rightarrow \mathbb{R}^{D}$ with parameters $\boldsymbol{\phi}$.

For a random variable $\mathbf{x}=T_{\boldsymbol{\phi}}(\mathbf{u})$ where $\mathbf{u}$ follows base distribution $p_{\mathbf{u}}$, the change of variables formula gives its density as:

$$q_{\boldsymbol{\phi}}(\mathbf{x})=p_{\mathbf{u}}(\mathbf{u})\left|\operatorname{det} J_{T_{\boldsymbol{\phi}}}(\mathbf{u})\right|^{-1}, \quad \mathbf{u}=T_{\boldsymbol{\phi}}^{-1}(\mathbf{x}) $$

We use masked autoregressive flow (MAF), which constructs the transformation through an autoregressive process:

$$ \mathbf{x}^{(i)}=g_{\text {scale }}\left(\alpha^{(i)}\right) \cdot \mathbf{u}^{(i)}+g_{\text {shift }}\left(\mu^{(i)}\right) $$

Bayesian Inference

Bayesian inference uses Bayes' theorem to determine posterior distribution $p(\mathbf{x} \mid \mathcal{D})$ of parameters $\mathbf{x}$ given data $\mathcal{D}$:

$$ p(\mathbf{x} \mid \mathcal{D})=\frac{p(\mathcal{D} \mid \mathbf{x}) p(\mathbf{x})}{p(\mathcal{D})} $$

Standard approximation approaches (VI and MCMC) require many likelihood evaluations, making them impractical for expensive models. While surrogate methods can approximate the target (log) density from limited evaluations, they don't directly yield proper probability distributions, requiring additional steps to obtain posterior samples. Our approach addresses this limitation by using normalizing flows as regression models that directly provide tractable posterior approximations.

Normalizing Flow Regression (NFR)

Key Innovation

NFR directly yields a tractable posterior approximation through regression on existing log-density evaluations, without requiring additional sampling or inference steps. The flow regression model provides both a normalized posterior density that's easy to evaluate and sample from, and an estimate of the normalizing constant (model evidence) for model comparison.

Method Overview

We use a normalizing flow with normalized density $q_{\boldsymbol{\phi}}(\mathbf{x})$ to fit observations of the log density of an unnormalized target posterior. The log-density prediction of our regression model is:

$$ f_{\boldsymbol{\psi}}(\mathbf{x})=f_{\boldsymbol{\phi}}(\mathbf{x})+C $$

where $f_{\boldsymbol{\phi}}(\mathbf{x})=\log q_{\boldsymbol{\phi}}(\mathbf{x})$ is the flow's log-density, and $C$ accounts for the unknown normalizing constant. We train the model via MAP estimation by maximizing:

$$ \mathcal{L}(\boldsymbol{\psi}) =\sum_{n=1}^{N} \log p\left(y_{n} \mid f_{\boldsymbol{\psi}}\left(\mathbf{x}_{n}\right), \sigma_{n}^{2}\right)+\log p(\boldsymbol{\phi})+\log p(C) $$

Robust Likelihood Function

Standard Gaussian likelihood for log-density observations would overemphasize near-zero density regions at the expense of high-density regions. We address this with a Tobit likelihood that censors observations below a threshold $y_{\text{low}}$:

Illustration of the censoring effect of the Tobit likelihood on a target density
Illustration of the Tobit likelihood's censoring effect. The shaded region represents censored observations with log-density values below $y_{\text{low}}$.

Prior Specification and Optimization

We use a multivariate Gaussian base distribution estimated from high-density observations, and constrain flow transformations to stay reasonably close to this base distribution setting a prior over flows.

Effect of prior variance on normalizing flow behavior
Prior over flows. Example flow realizations sampled from different priors over flow parameters $\phi$, using a standard Gaussian as the base distribution. (a) Strong prior $\rightarrow$ too rigid; (b) Intermediate prior $\rightarrow$ reasonable shapes; (c) Weak prior $\rightarrow$ ill-behaved distributions.

We follow an annealed optimization approach which gradually fits the flow to a tempered target across training iterations, providing stability.

Annealed optimization strategy
Annealed optimization progressively fits the flow to tempered observations, with inverse temperature $\beta$ increasing over training iterations.

Experiments

We evaluate our normalizing flow regression (NFR) method through a series of experiments on both synthetic and real-world problems, comparing its performance against established baselines.

Experimental Setup

For all experiments, we use log-density evaluations collected during maximum a posteriori (MAP) optimization runs, reflecting real-world settings where practitioners have already performed optimization to find parameter point estimates. We compare NFR against three baselines:

We evaluate the methods using three metrics: the absolute difference between true and estimated log normalizing constant (ΔLML), the mean marginal total variation distance (MMTV), and the "Gaussianized" symmetrized KL divergence (GsKL).

Results

We tested NFR on five benchmark problems of increasing complexity, from synthetic test cases to challenging real-world applications. The consolidated results demonstrate that NFR consistently outperforms baseline methods across problems of varying dimensionality and complexity.

Problem (Dimension) Laplace BBVI (10×) VSBQ NFR (ours)
ΔLML↓ MMTV↓ GsKL↓ ΔLML↓ MMTV↓ GsKL↓ ΔLML↓ MMTV↓ GsKL↓ ΔLML↓ MMTV↓ GsKL↓
Rosenbrock-G. (D=6) 1.30 0.24 0.91 1.00 0.24 0.46 0.20 0.037 0.018 0.013 0.028 0.0042
Lumpy (D=10) 0.81 0.15 0.22 0.32 0.046 0.013 0.11 0.033 0.0070 0.026 0.022 0.0020
Timing Model (D=5) N/A N/A N/A 0.32 0.11 0.13 0.21 0.044 0.0065 0.18 0.049 0.0086
Lotka-Volterra (D=8) 0.62 0.11 0.14 0.24 0.029 0.0087 0.95 0.085 0.060 0.18 0.016 0.00066
Multisensory (D=12) N/A N/A N/A N/A N/A N/A 4.1e+2 0.87 2.0e+2 0.82 0.13 0.11

Results across all benchmark problems. Lower values are better for all metrics. Best results are shown in dark green, acceptable results in light green, and poor results in pink. N/A indicates that the method was not applicable or computationally infeasible for that problem.

Key Results Highlights

Multivariate Rosenbrock-Gaussian posterior visualizations
Example contours of the marginal density for the Multivariate Rosenbrock-Gaussian showing performance of different methods. Ground-truth samples are in gray.

Synthetic Problems: On the challenging Rosenbrock-Gaussian distribution (D=6) with its curved correlation structure, NFR achieves substantially better performance than all baselines, successfully capturing the complex posterior shape. The Laplace approximation fails to capture the non-Gaussian structure, while BBVI struggles with convergence issues.

For the mildly multimodal Lumpy distribution (D=10), NFR again shows the best overall performance, though all methods except Laplace perform reasonably well in this case.

Real-World Applications

On real-world problems, NFR demonstrates particular strengths:

Discussion and Conclusions

We introduced normalizing flow regression (NFR), a novel offline inference method that directly yields a tractable posterior approximation through regression on existing log-density evaluations, unlike traditional surrogate approaches that require additional sampling or inference steps.

Normalizing flows offer key advantages for this task: they ensure proper probability distributions, enable easy sampling, scale efficiently with evaluations, and flexibly incorporate prior knowledge. Our empirical evaluation demonstrates that NFR effectively approximates both synthetic and real-world posteriors, excelling particularly in challenging scenarios where standard methods are computationally prohibitive.

In the paper's appendix, we also discuss diagnostics to detect failures of the method, from simple visualizations to Pareto-smoothed importance sampling.

Limitations and Future Work

While NFR shows promising results, it has limitations: it requires sufficient coverage of probability mass regions (challenging in dimensions D > 15-20), depends on evaluations adequately exploring the posterior landscape, and needs careful prior specification. Future research directions include:

NFR represents a promising approach for Bayesian inference in computationally intensive settings, offering robust, uncertainty-aware modeling across scientific applications.

Comic strip about NFR
Fast Flows, Not Slow Chains. (Comic written by GPT-4.5 and graphics by gpt-4o.)

Acknowledgments: This work was supported by Research Council of Finland (grants 358980 and 356498), and by the Flagship programme: Finnish Center for Artificial Intelligence FCAI. The authors wish to thank the Finnish Computing Competence Infrastructure (FCCI) for supporting this project with computational and data storage resources.

References

  1. Acerbi, L. (2018). Variational Bayesian Monte Carlo. Advances in Neural Information Processing Systems, 31:8222-8232.
  2. Acerbi, L. (2020). Variational Bayesian Monte Carlo with noisy likelihoods. Advances in Neural Information Processing Systems, 33:8211-8222.
  3. Blei, D.M., Kucukelbir, A., and McAuliffe, J.D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859-877.
  4. Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2017). Density estimation using Real NVP. International Conference on Learning Representations.
  5. Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2013). Bayesian Data Analysis (3rd edition). CRC Press.
  6. Li, C., Clarté, G., Jørgensen, M., and Acerbi, L. (2024). Fast post-process Bayesian inference with variational sparse Bayesian quadrature. arXiv preprint arXiv:2303.05263.
  7. Papamakarios, G., Pavlakou, T., and Murray, I. (2017). Masked autoregressive flow for density estimation. Advances in Neural Information Processing Systems, 30.
  8. Papamakarios, G., Nalisnick, E., Rezende, D.J., Mohamed, S., and Lakshminarayanan, B. (2021). Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(57):1-64.
  9. Ranganath, R., Gerrish, S., and Blei, D. (2014). Black box variational inference. Artificial Intelligence and Statistics, 814-822.
  10. Rezende, D.J. and Mohamed, S. (2015). Variational inference with normalizing flows. Proceedings of the 32nd International Conference on Machine Learning, 1530-1538.