PyVBMC Example 4: Multiple runs as validation

PyVBMC Example 4: Multiple runs as validation#

Here we explore running PyVBMC multiple times in order to validate the results: if the final variational posteriors obtained from several different initial points are consistent, we can be more confident that the results are correct.

This notebook is Part 4 of a series of notebooks in which we present various example usages for VBMC with the PyVBMC package. The code used in this example is available as a script here.

import numpy as np
import scipy.stats as scs
from scipy.optimize import minimize
from pyvbmc import VBMC
import matplotlib.pyplot as plt

1. Model definition#

We use the same toy target function as in Example 1, a broad Rosenbrock’s banana function in \(D = 2\), with unbounded parameters:

D = 2  # We'll use a 2-D problem for a quicker demonstration
prior_mu = np.zeros(D)
prior_var = 3 * np.ones(D)


def log_prior(theta):
    """Multivariate normal prior on theta."""
    cov = np.diag(prior_var)
    return scs.multivariate_normal(prior_mu, cov).logpdf(theta)


# log-likelihood (Rosenbrock)
def log_likelihood(theta):
    """D-dimensional Rosenbrock's banana function."""
    theta = np.atleast_2d(theta)

    x, y = theta[:, :-1], theta[:, 1:]
    return -np.sum((x**2 - y) ** 2 + (x - 1) ** 2 / 100, axis=1)


# Full model:
def log_joint(theta, data=np.ones(D)):
    """log-density of the joint distribution."""
    return log_likelihood(theta) + log_prior(theta)


LB = np.full((1, D), -np.inf)  # Lower bounds
UB = np.full((1, D), np.inf)  # Upper bounds
PLB = np.full((1, D), prior_mu - np.sqrt(prior_var))  # Plausible lower bounds
PUB = np.full((1, D), prior_mu + np.sqrt(prior_var))  # Plausible upper bounds
options = {
    "max_fun_evals": 50 * D  # Slightly reduced from 50 * (D + 2), for speed
}

2. Validation#

For validation, we recommend running PyVBMC 3-4 times from different initial points:

np.random.seed(42)
n_runs = 3
vps, elbos, elbo_sds, success_flags, result_dicts = [], [], [], [], []
for i in range(n_runs):
    print(f"PyVBMC run {i} of {n_runs}")

    # Determine initial point x0:
    if i == 0:
        x0 = prior_mu  # First run, start from prior mean
    else:
        x0 = np.random.uniform(PLB, PUB)  # Other runs, randomize
    # Preliminary maximum a posteriori (MAP) estimation:
    x0 = minimize(
        lambda t: -log_joint(t),
        x0.ravel(),  # minimize expects 1-D input for initial point x0
        bounds=[(-np.inf, np.inf), (-np.inf, np.inf)],
    ).x

    # Run PyVBMC:
    vbmc = VBMC(
        log_joint,
        x0,
        LB,
        UB,
        PLB,
        PUB,
        options=options,
    )
    vp, results = vbmc.optimize()

    # Record the results:
    vps.append(vp)
    elbos.append(results["elbo"])
    elbo_sds.append(results["elbo_sd"])
    success_flags.append(results["success_flag"])
    result_dicts.append(results)
PyVBMC run 0 of 3
Reshaping x0 to row vector.
Beginning variational optimization assuming EXACT observations of the log-joint.
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
     0         10          -1.69         0.80     73438.00        2        inf     start warm-up
     1         15          -1.80         0.08         1.20        2        inf     
     2         20          -1.82         0.00         0.03        2      0.889     
     3         25          -1.81         0.00         0.00        2     0.0555     
     4         30          -1.81         0.00         0.00        2     0.0611     end warm-up
     5         35          -1.80         0.00         0.00        2       0.06     
     6         40          -1.81         0.00         0.00        2      0.127     
     7         45          -1.61         0.00         0.09        5       2.92     
     8         50          -1.58         0.00         0.02        6       0.52     rotoscale, undo rotoscale
     9         55          -1.57         0.00         0.00        9     0.0435     
    10         60          -1.56         0.00         0.00       12     0.0439     
    11         65          -1.54         0.00         0.00       15     0.0994     
    12         70          -1.53         0.00         0.01       17      0.192     stable
   inf         70          -1.53         0.00         0.01       50      0.192     finalize
Inference terminated: variational solution stable for options.tol_stable_count fcn evaluations.
Estimated ELBO: -1.527 +/-0.001.
PyVBMC run 1 of 3
Reshaping x0 to row vector.
Beginning variational optimization assuming EXACT observations of the log-joint.
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
     0         10          -1.39         0.60    183360.39        2        inf     start warm-up
     1         15          -1.79         0.17         5.72        2        inf     
     2         20          -1.83         0.00         0.01        2      0.356     
     3         25          -1.79         0.00         0.01        2      0.474     
     4         30          -1.83         0.00         0.00        2      0.127     end warm-up
     5         35          -1.81         0.00         0.00        2      0.118     
     6         40          -1.81         0.00         0.00        2     0.0519     
     7         45          -1.62         0.00         0.11        5       3.22     
     8         50          -1.60         0.00         0.00        6      0.176     
     9         55          -1.55         0.00         0.01        9       0.34     rotoscale, undo rotoscale
    10         60          -1.55         0.00         0.01       12      0.207     
    11         65          -1.54         0.00         0.00       15     0.0598     
    12         70          -1.55         0.00         0.00       17     0.0632     stable
   inf         70          -1.52         0.00         0.00       50     0.0632     finalize
Inference terminated: variational solution stable for options.tol_stable_count fcn evaluations.
Estimated ELBO: -1.525 +/-0.000.
PyVBMC run 2 of 3
Reshaping x0 to row vector.
Beginning variational optimization assuming EXACT observations of the log-joint.
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
     0         10          -2.19         0.92     19097.79        2        inf     start warm-up
     1         15          -1.93         0.10         0.03        2        inf     
     2         20          -1.84         0.00         0.03        2        1.1     
     3         25          -1.85         0.00         0.01        2      0.178     
     4         30          -1.86         0.00         0.00        2      0.047     end warm-up
     5         35          -1.83         0.00         0.00        2      0.113     
     6         40          -1.85         0.00         0.00        2     0.0818     
     7         45          -1.78         0.00         0.01        5      0.429     
     8         50          -1.62         0.00         0.09        8       2.56     rotoscale, undo rotoscale
     9         55          -1.58         0.00         0.01        9      0.432     
    10         60          -1.57         0.00         0.00       12     0.0779     
    11         65          -1.54         0.00         0.00       15      0.147     
    12         70          -1.53         0.00         0.01       17      0.249     
    13         75          -1.53         0.00         0.00       18     0.0368     stable
   inf         75          -1.53         0.00         0.00       50     0.0368     finalize
Inference terminated: variational solution stable for options.tol_stable_count fcn evaluations.
Estimated ELBO: -1.529 +/-0.000.

We now perform some diagnostic checks on the variational solution. First, we check that the ELBO’s from different runs are close to each other (i.e. with a difference much smaller than 1):

print(elbos)
[-1.5273433504679828, -1.5248830970271374, -1.5290414540115609]

Then, we check that the variational posteriors from distinct runs are similar. As a measure of similarity, we compute the Kullback-Leibler divergence between each pair:

kl_matrix = np.zeros((n_runs, n_runs))
for i in range(n_runs):
    for j in range(i, n_runs):
        # The `kldiv` method computes the divergence in both directions:
        kl_ij, kl_ji = vps[i].kl_div(vp2=vps[j])
        kl_matrix[i, j] = kl_ij
        kl_matrix[j, i] = kl_ji

Note that the KL divergence is asymmetric, so we have an asymmetric matrix:

print(kl_matrix)
[[-0.          0.0126071   0.00961694]
 [ 0.01044285 -0.          0.0136719 ]
 [ 0.01059469  0.01364032 -0.        ]]

Ideally, we want all KL divergence matrix entries to be much smaller than 1. For a qualitative validation, we recommend a visual inspection of the posteriors:

for i, vp in enumerate(vps):
    samples, __ = vp.sample(1000)
    plt.scatter(
        samples[:, 0],
        samples[:, 1],
        alpha=1 / len(vps),
        marker=".",
        label=f"VP {i}",
    )
plt.title("""Variational Posterior Samples""")
plt.xlabel("x0")
plt.ylabel("x1")
plt.legend();
../_images/fc598665a14e73f5c1aa68a5bb4da03808ab86e65676a16446121058f47e4ba4.png

We can also check that convergence was acheived in all runs, according to PyVBMC (we want success_flag = True for each run):

print(success_flags)
[True, True, True]

Finally, we can pick the variational solution with highest ELCBO (the lower confidence bound on the ELBO).

beta_lcb = 3.0  # Standard confidence parameter (in standard deviations)
# beta_lcb = 5.0  # This is more conservative
elcbos = np.array(elbos) - beta_lcb * np.array(elbo_sds)
idx_best = np.argmax(elcbos)
print(idx_best)
1

The VariationalPosterior class implements save and load methods, similar to VBMC (as we saw in Example 3). We’ll save this best posterior to a file, so we can compare it against our results in Example 6:

vps[idx_best].save("noise_free_vp.pkl", overwrite=True)

3. Conclusions#

In this notebook, we have shown you some techniques to validate the results of PyVBMC by comparing statitics across multiple runs.

In the next notebook, we will demonstrate using PyVBMC with models in which the likelihood function is noisy, or can only be estimated up to some noise.

Acknowledgments#

Work on the PyVBMC package was funded by the Finnish Center for Artificial Intelligence FCAI.