`kde_1d`#

pyvbmc.stats.kde_1d(samples: ndarray, n: int = 16384, lower_bound: float = None, upper_bound: float = None)[source]#

Reliable and extremely fast kernel density estimator for 1D data.

One-dimensional kernel density estimator based on fast Fourier transform. A Gaussian kernel is assumed and the bandwidth is chosen automatically using the technique developed by Botev et al. (2010) [1].

Parameters:

samplesnp.ndarray: The samples from which the density estimate is computed.
nint, optional: The number of mesh points used in the uniform discretization of the interval [lower_bound, upper_bound]; n has to be a power of two; if n is not a power of two, it is rounded up to the next power of two, i.e., n is set to n=2^ceil(log2(n)), by default 2**14.
lower_boundfloat, optional: The lower bound of the interval in which the density is being computed, if not given the default value is lower_bound=min(samples)-range/10, where range=max(samples)-min(samples), by default None.
upper_boundfloat, optional: The upper bound of the interval in which the density is being computed, if not given the default value is upper_bound=max(data)+Range/10, where range=max(samples)-min(samples), by default None.

Returns:

densitynp.ndarray: 1D vector of length n with the values of the kernel density estimate at the grid points.
xmeshnp.ndarray: 1D vector of grid over which the density estimate is computed.
bandwidthnp.ndarray: The optimal bandwidth (Gaussian kernel assumed).

Notes

This implementation is based on the MATLAB implementation by Zdravko Botev, and was further inspired by the Python implementations by Daniel B. Smith and the bandwidth selection code in KDEpy [2]. We thank Zdravko Botev for useful clarifications on the implementation of the fixed_point function.

Unlike other implementations, this one is immune to problems caused by multimodal densities with widely separated modes (see example). The bandwidth estimation does not deteriorate for multimodal densities because a parametric model is never assumed for the data.

References

[1]

Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density estimation via diffusion. The Annals of Statistics, 38(5):2916-2957, 2010.

[2]

tommyod/KDEpy

Examples

import numpy as np
from numpy.random import randn

samples = np.concatenate(
    (randn(100, 1), randn(100, 1) * 2 + 35, randn(100, 1) + 55)
)
kde_1d(samples, 2 ** 14, min(samples) - 5, max(samples) + 5)

kde_1d

Contents

kde_1d#

`kde_1d`#