Normalizing Flow Regression for Bayesian Inference with Offline Likelihood Evaluations

Chengkun Li¹, Bobby Huggins², Petrus Mikkola¹, Luigi Acerbi¹

¹Department of Computer Science, University of Helsinki
²Department of Computer Science and Engineering, Washington University in St. Louis

7th Symposium on Advances in Approximate Bayesian Inference (AABI) - Proceedings track, 2025

Code Paper Social Markdown

TL;DR

We propose Normalizing Flow Regression (NFR), a novel offline inference method for approximating Bayesian posterior distributions using existing log-density evaluations. Unlike traditional surrogate approaches, NFR directly yields a tractable posterior approximation through regression on existing evaluations, without requiring additional sampling or inference steps. Our method performs well on both synthetic benchmarks and real-world applications from neuroscience and biology, offering a promising approach for Bayesian inference when standard methods are computationally prohibitive.

@article{li2025normalizing, title={Normalizing Flow Regression for Bayesian Inference with Offline Likelihood Evaluations}, author={Li, Chengkun and Huggins, Bobby and Mikkola, Petrus and Acerbi, Luigi}, journal={7th Symposium on Advances in Approximate Bayesian Inference (AABI) - Proceedings track}, year={2025} }

Introduction

Bayesian inference with computationally expensive likelihood evaluations remains a significant challenge in scientific domains. When model evaluations involve extensive numerical methods or simulations, standard Bayesian approaches like MCMC or variational inference become impractical due to their requirement for numerous density evaluations.

Practitioners often resort to simpler alternatives like maximum a posteriori (MAP) estimation, but these point estimates fail to capture parameter uncertainty. Recent surrogate modeling approaches can approximate expensive density functions but typically don't directly yield valid probability distributions, requiring additional sampling or inference steps.

Our Contribution

We propose Normalizing Flow Regression (NFR), a novel offline inference method that directly yields a tractable posterior approximation through regression on existing log-density evaluations. Unlike other surrogate methods, NFR directly produces a posterior distribution that is easy to evaluate and sample from.

NFR efficiently recycles existing log-density evaluations (e.g., from MAP optimizations) rather than requiring costly new evaluations from the target model. This makes it particularly valuable in settings where standard Bayesian methods are computationally prohibitive.

Background

Normalizing Flows

Normalizing flows construct flexible probability distributions by transforming a simple base distribution (typically a multivariate Gaussian) through an invertible transformation $T_{\boldsymbol{\phi}}: \mathbb{R}^{D} \rightarrow \mathbb{R}^{D}$ with parameters $\boldsymbol{\phi}$.

For a random variable $\mathbf{x}=T_{\boldsymbol{\phi}}(\mathbf{u})$ where $\mathbf{u}$ follows base distribution $p_{\mathbf{u}}$, the change of variables formula gives its density as:

$$q_{\boldsymbol{\phi}}(\mathbf{x})=p_{\mathbf{u}}(\mathbf{u})\left|\operatorname{det} J_{T_{\boldsymbol{\phi}}}(\mathbf{u})\right|^{-1}, \quad \mathbf{u}=T_{\boldsymbol{\phi}}^{-1}(\mathbf{x}) $$

We use masked autoregressive flow (MAF), which constructs the transformation through an autoregressive process:

$$ \mathbf{x}^{(i)}=g_{\text {scale }}\left(\alpha^{(i)}\right) \cdot \mathbf{u}^{(i)}+g_{\text {shift }}\left(\mu^{(i)}\right) $$

Bayesian Inference

Bayesian inference uses Bayes' theorem to determine posterior distribution $p(\mathbf{x} \mid \mathcal{D})$ of parameters $\mathbf{x}$ given data $\mathcal{D}$:

$$ p(\mathbf{x} \mid \mathcal{D})=\frac{p(\mathcal{D} \mid \mathbf{x}) p(\mathbf{x})}{p(\mathcal{D})} $$

Standard approximation approaches (VI and MCMC) require many likelihood evaluations, making them impractical for expensive models. While surrogate methods can approximate the target (log) density from limited evaluations, they don't directly yield proper probability distributions, requiring additional steps to obtain posterior samples. Our approach addresses this limitation by using normalizing flows as regression models that directly provide tractable posterior approximations.

Normalizing Flow Regression (NFR)

Key Innovation

NFR directly yields a tractable posterior approximation through regression on existing log-density evaluations, without requiring additional sampling or inference steps. The flow regression model provides both a normalized posterior density that's easy to evaluate and sample from, and an estimate of the normalizing constant (model evidence) for model comparison.

Method Overview

We use a normalizing flow with normalized density $q_{\boldsymbol{\phi}}(\mathbf{x})$ to fit observations of the log density of an unnormalized target posterior. The log-density prediction of our regression model is:

$$ f_{\boldsymbol{\psi}}(\mathbf{x})=f_{\boldsymbol{\phi}}(\mathbf{x})+C $$

where $f_{\boldsymbol{\phi}}(\mathbf{x})=\log q_{\boldsymbol{\phi}}(\mathbf{x})$ is the flow's log-density, and $C$ accounts for the unknown normalizing constant. We train the model via MAP estimation by maximizing:

$$ \mathcal{L}(\boldsymbol{\psi}) =\sum_{n=1}^{N} \log p\left(y_{n} \mid f_{\boldsymbol{\psi}}\left(\mathbf{x}_{n}\right), \sigma_{n}^{2}\right)+\log p(\boldsymbol{\phi})+\log p(C) $$

Robust Likelihood Function

Standard Gaussian likelihood for log-density observations would overemphasize near-zero density regions at the expense of high-density regions. We address this with a Tobit likelihood that censors observations below a threshold $y_{\text{low}}$:

Illustration of the censoring effect of the Tobit likelihood on a target density — **Illustration of the Tobit likelihood's censoring effect.** The shaded region represents censored observations with log-density values below $y_{\text{low}}$.

Prior Specification and Optimization

We use a multivariate Gaussian base distribution estimated from high-density observations, and constrain flow transformations to stay reasonably close to this base distribution setting a prior over flows.

Effect of prior variance on normalizing flow behavior — **Prior over flows.** Example flow realizations sampled from different priors over flow parameters $\phi$, using a standard Gaussian as the base distribution. (a) Strong prior $\rightarrow$ too rigid; (b) Intermediate prior $\rightarrow$ reasonable shapes; (c) Weak prior $\rightarrow$ ill-behaved distributions.

We follow an annealed optimization approach which gradually fits the flow to a tempered target across training iterations, providing stability.

Annealed optimization strategy — **Annealed optimization** progressively fits the flow to tempered observations, with inverse temperature $\beta$ increasing over training iterations.

Input: Observations $(\mathbf{X}, \mathbf{y}, \boldsymbol{\sigma}^{2})$, tempering steps $t_{\text{end}}$, max iterations $T_{\max}$

Output: Flow $T_{\phi}$ approximating the target, log normalizing constant $C$

Compute base distribution from observations
For $t \leftarrow 0$ to $T_{\max}$ do:
- Set inverse temperature $\beta_{t}$ according to tempering schedule
- Update tempered observations
- Optimize $C$ with fixed $\boldsymbol{\phi}$, then jointly optimize $(\boldsymbol{\phi}, C)$

Experiments

We evaluate our normalizing flow regression (NFR) method through a series of experiments on both synthetic and real-world problems, comparing its performance against established baselines.

Experimental Setup

For all experiments, we use log-density evaluations collected during maximum a posteriori (MAP) optimization runs, reflecting real-world settings where practitioners have already performed optimization to find parameter point estimates. We compare NFR against three baselines:

Laplace approximation: A Gaussian approximation using the MAP estimate and numerical Hessian
Black-box variational inference (BBVI): Using the same flow architecture as NFR but trained through standard variational inference with up to ten times more likelihood evaluations ($10\times$)
Variational sparse Bayesian quadrature (VSBQ): A state-of-the-art offline surrogate method using sparse Gaussian processes

We evaluate the methods using three metrics: the absolute difference between true and estimated log normalizing constant (ΔLML), the mean marginal total variation distance (MMTV), and the "Gaussianized" symmetrized KL divergence (GsKL).

Results

We tested NFR on five benchmark problems of increasing complexity, from synthetic test cases to challenging real-world applications. The consolidated results demonstrate that NFR consistently outperforms baseline methods across problems of varying dimensionality and complexity.

Problem (Dimension)	Laplace			BBVI (10×)			VSBQ			NFR (ours)
Problem (Dimension)	ΔLML↓	MMTV↓	GsKL↓	ΔLML↓	MMTV↓	GsKL↓	ΔLML↓	MMTV↓	GsKL↓	ΔLML↓	MMTV↓	GsKL↓
Rosenbrock-G. (D=6)	1.30	0.24	0.91	1.00	0.24	0.46	0.20	0.037	0.018	0.013	0.028	0.0042
Lumpy (D=10)	0.81	0.15	0.22	0.32	0.046	0.013	0.11	0.033	0.0070	0.026	0.022	0.0020
Timing Model (D=5)	N/A	N/A	N/A	0.32	0.11	0.13	0.21	0.044	0.0065	0.18	0.049	0.0086
Lotka-Volterra (D=8)	0.62	0.11	0.14	0.24	0.029	0.0087	0.95	0.085	0.060	0.18	0.016	0.00066
Multisensory (D=12)	N/A	N/A	N/A	N/A	N/A	N/A	4.1e+2	0.87	2.0e+2	0.82	0.13	0.11

Results across all benchmark problems. Lower values are better for all metrics. Best results are shown in dark green, acceptable results in light green, and poor results in pink. N/A indicates that the method was not applicable or computationally infeasible for that problem.

Key Results Highlights

Multivariate Rosenbrock-Gaussian posterior visualizations — Example contours of the marginal density for the **Multivariate Rosenbrock-Gaussian** showing performance of different methods. Ground-truth samples are in gray.

Synthetic Problems: On the challenging Rosenbrock-Gaussian distribution (D=6) with its curved correlation structure, NFR achieves substantially better performance than all baselines, successfully capturing the complex posterior shape. The Laplace approximation fails to capture the non-Gaussian structure, while BBVI struggles with convergence issues.

For the mildly multimodal Lumpy distribution (D=10), NFR again shows the best overall performance, though all methods except Laplace perform reasonably well in this case.

Real-World Applications

On real-world problems, NFR demonstrates particular strengths:

Bayesian Timing Model (D=5): Both NFR and VSBQ accurately approximate this posterior, even with added log-likelihood estimation noise that makes the problem more challenging and realistic.
Lotka-Volterra Model (D=8): NFR significantly outperforms all baselines on this problem with coupled differential equations, demonstrating its effectiveness on problems with moderate dimensionality and complex dynamics.
Multisensory Perception (D=12): In our most challenging test, NFR performs remarkably well where the Laplace approximation is inapplicable and BBVI is computationally prohibitive. VSBQ completely fails to produce a usable approximation for this high-dimensional problem.

Discussion and Conclusions

We introduced normalizing flow regression (NFR), a novel offline inference method that directly yields a tractable posterior approximation through regression on existing log-density evaluations, unlike traditional surrogate approaches that require additional sampling or inference steps.

Normalizing flows offer key advantages for this task: they ensure proper probability distributions, enable easy sampling, scale efficiently with evaluations, and flexibly incorporate prior knowledge. Our empirical evaluation demonstrates that NFR effectively approximates both synthetic and real-world posteriors, excelling particularly in challenging scenarios where standard methods are computationally prohibitive.

In the paper's appendix, we also discuss diagnostics to detect failures of the method, from simple visualizations to Pareto-smoothed importance sampling.

Limitations and Future Work

While NFR shows promising results, it has limitations: it requires sufficient coverage of probability mass regions (challenging in dimensions D > 15-20), depends on evaluations adequately exploring the posterior landscape, and needs careful prior specification. Future research directions include:

Incorporating additional likelihood evaluation sources beyond MAP optimization
Developing active learning strategies for sequential evaluation acquisition
Exploring advanced flow architectures for higher-dimensional problems

NFR represents a promising approach for Bayesian inference in computationally intensive settings, offering robust, uncertainty-aware modeling across scientific applications.

Fast Flows, Not Slow Chains. (Comic written by GPT-4.5 and graphics by gpt-4o.)

Acknowledgments: This work was supported by Research Council of Finland (grants 358980 and 356498), and by the Flagship programme: Finnish Center for Artificial Intelligence FCAI. The authors wish to thank the Finnish Computing Competence Infrastructure (FCCI) for supporting this project with computational and data storage resources.

References

Acerbi, L. (2018). Variational Bayesian Monte Carlo. Advances in Neural Information Processing Systems, 31:8222-8232.
Acerbi, L. (2020). Variational Bayesian Monte Carlo with noisy likelihoods. Advances in Neural Information Processing Systems, 33:8211-8222.
Blei, D.M., Kucukelbir, A., and McAuliffe, J.D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859-877.
Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2017). Density estimation using Real NVP. International Conference on Learning Representations.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2013). Bayesian Data Analysis (3rd edition). CRC Press.
Li, C., Clarté, G., Jørgensen, M., and Acerbi, L. (2024). Fast post-process Bayesian inference with variational sparse Bayesian quadrature. arXiv preprint arXiv:2303.05263.
Papamakarios, G., Pavlakou, T., and Murray, I. (2017). Masked autoregressive flow for density estimation. Advances in Neural Information Processing Systems, 30.
Papamakarios, G., Nalisnick, E., Rezende, D.J., Mohamed, S., and Lakshminarayanan, B. (2021). Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(57):1-64.
Ranganath, R., Gerrish, S., and Blei, D. (2014). Black box variational inference. Artificial Intelligence and Statistics, 814-822.
Rezende, D.J. and Mohamed, S. (2015). Variational inference with normalizing flows. Proceedings of the 32nd International Conference on Machine Learning, 1530-1538.