lampe.inference#

Inference components such as estimators, training losses and MCMC samplers.

Classes#

`NRE`	Creates a neural ratio estimation (NRE) network.
`NRELoss`	Creates a module that calculates the cross-entropy loss for a NRE network.
`BNRELoss`	Creates a module that calculates the balanced cross-entropy loss for a balanced NRE (BNRE) network.
`CNRELoss`	Creates a module that calculates the cross-entropy loss for a contrastive NRE (CNRE) network.
`BCNRELoss`	Creates a module that calculates the balanced cross-entropy loss for a balanced CNRE (BCNRE) network.
`AMNRE`	Creates an arbitrary marginal neural ratio estimation (AMNRE) network.
`AMNRELoss`	Creates a module that calculates the cross-entropy loss for an AMNRE network.
`NPE`	Creates a neural posterior estimation (NPE) normalizing flow.
`NPELoss`	Creates a module that calculates the negative log-likelihood loss for a NPE normalizing flow.
`FMPE`	Creates a flow matching posterior estimation (FMPE) network.
`FMPELoss`	Creates a module that calculates the flow matching loss for a FMPE regressor.
`MetropolisHastings`	Creates a batched Metropolis-Hastings sampler.

Descriptions#

class lampe.inference.NRE(theta_dim, x_dim, build=<class 'zuko.nn.MLP'>, **kwargs)#

Creates a neural ratio estimation (NRE) network.

The principle of neural ratio estimation is to train a classifier network \(d_\phi(\theta, x)\) to discriminate between pairs \((\theta, x)\) equally sampled from the joint distribution \(p(\theta, x)\) and the product of the marginals \(p(\theta)p(x)\). Formally, the optimization problem is

\[\arg\min_\phi \frac{1}{2} \mathbb{E}_{p(\theta, x)} \big[ \ell(d_\phi(\theta, x)) \big] + \frac{1}{2} \mathbb{E}_{p(\theta)p(x)} \big[ \ell(1 - d_\phi(\theta, x)) \big]\]

where \(\ell(p) = -\log p\) is the negative log-likelihood. For this task, the decision function modeling the Bayes optimal classifier is

\[d(\theta, x) = \frac{p(\theta, x)}{p(\theta, x) + p(\theta) p(x)}\]

thereby defining the likelihood-to-evidence (LTE) ratio

\[r(\theta, x) = \frac{d(\theta, x)}{1 - d(\theta, x)} = \frac{p(\theta, x)}{p(\theta) p(x)} = \frac{p(x | \theta)}{p(x)} = \frac{p(\theta | x)}{p(\theta)} .\]

To prevent numerical stability issues when \(d_\phi(\theta, x) \to 0\), the neural network returns the logit of the class prediction \(\text{logit}(d_\phi(\theta, x)) = \log r_\phi(\theta, x)\).

References

Approximating Likelihood Ratios with Calibrated Discriminative Classifiers (Cranmer et al., 2015)

https://arxiv.org/abs/1506.02169

Likelihood-free MCMC with Amortized Approximate Ratio Estimators (Hermans et al., 2019)

https://arxiv.org/abs/1903.04057

Parameters

theta_dim (int) – The dimensionality \(D\) of the parameter space.
x_dim (int) – The dimensionality \(L\) of the observation space.
build (Callable[[int, int], Module]) – The network constructor (e.g. lampe.nn.ResMLP). It takes the number of input and output features as positional arguments.
kwargs – Keyword arguments passed to the constructor.

forward(theta, x)#

Parameters

theta (Tensor) – The parameters \(\theta\), with shape \((*, D)\).
x (Tensor) – The observation \(x\), with shape \((*, L)\).

Returns

The log-ratio \(\log r_\phi(\theta, x)\), with shape \((*,)\).

Return type

Tensor

class lampe.inference.NRELoss(estimator)#

Creates a module that calculates the cross-entropy loss for a NRE network.

Given a batch of \(N \geq 2\) pairs \((\theta_i, x_i)\), the module returns

\[l = \frac{1}{2N} \sum_{i = 1}^N \ell(d_\phi(\theta_i, x_i)) + \ell(1 - d_\phi(\theta_{i+1}, x_i))\]

where \(\ell(p) = -\log p\) is the negative log-likelihood.

Parameters: estimator (Module) – A log-ratio network \(\log r_\phi(\theta, x)\).

forward(theta, x)#

Parameters

theta (Tensor) – The parameters \(\theta\), with shape \((N, D)\).
x (Tensor) – The observation \(x\), with shape \((N, L)\).

Returns

The scalar loss \(l\).

Return type

Tensor

class lampe.inference.BNRELoss(estimator, lmbda=100.0)#

Creates a module that calculates the balanced cross-entropy loss for a balanced NRE (BNRE) network.

Given a batch of \(N \geq 2\) pairs \((\theta_i, x_i)\), the module returns

\[\begin{split}l & = \frac{1}{2N} \sum_{i = 1}^N \ell(d_\phi(\theta_i, x_i)) + \ell(1 - d_\phi(\theta_{i+1}, x_i)) \\ & + \lambda \left(1 - \frac{1}{N} \sum_{i = 1}^N d_\phi(\theta_i, x_i) + d_\phi(\theta_{i+1}, x_i) \right)^2\end{split}\]

where \(\ell(p) = -\log p\) is the negative log-likelihood.

References

Towards Reliable Simulation-Based Inference with Balanced Neural Ratio Estimation (Delaunoy et al., 2022)

https://arxiv.org/abs/2208.13624

Parameters

estimator (Module) – A log-ratio network \(\log r_\phi(\theta, x)\).
lmbda (float) – The weight \(\lambda\) controlling the strength of the balancing condition.

forward(theta, x)#

Parameters

theta (Tensor) – The parameters \(\theta\), with shape \((N, D)\).
x (Tensor) – The observation \(x\), with shape \((N, L)\).

Returns

The scalar loss \(l\).

Return type

Tensor

class lampe.inference.CNRELoss(estimator, cardinality=2, gamma=1.0)#

Creates a module that calculates the cross-entropy loss for a contrastive NRE (CNRE) network.

The principle of contrastive neural ratio estimation (CNRE) is to predict whether a set \(\Theta = \{\theta^1, \dots, \theta^K\}\) contains or not the parameters that originated an observation \(x\). The elements of \(\Theta\) are drawn independently from the prior \(p(\theta)\) and the element \(\theta^k\) that originates the observation \(x \sim p(x | \theta^k)\) is chosen uniformly within \(\Theta\), such that

\[\begin{split}p(\Theta, x) & = p(\Theta) \, p(x | \Theta) \\ & = p(\Theta) \frac{1}{K} \sum_{k = 1}^K p(x | \theta^k) \\ & = p(\Theta) \, p(x) \frac{1}{K} \sum_{k = 1}^K r(\theta^k, x)\end{split}\]

where \(r(\theta, x)\) is the likelihood-to-evidence (LTE) ratio. The task is to discriminate between pairs \((\Theta, x)\) for which \(\Theta\) either does or does not contain the nominal parameters of \(x\), similar to the original NRE optimization problem. For this task, the decision function modeling the Bayes optimal classifier is

\[d(\Theta, x) = \frac{p(\Theta, x)}{p(\Theta, x) + \frac{1}{\gamma} p(\Theta) p(x)} = \frac{\sum_{k = 1}^K r(\theta^k, x)}{\frac{K}{\gamma} + \sum_{k = 1}^K r(\theta^k, x)} \, ,\]

where \(\gamma \in \mathbb{R}^+\) are the odds of \(\Theta\) containing to not containing the nominal parameters. Consequently, a classifier \(d_\phi(\Theta, x)\) can be equivalently replaced and trained as a composition of ratios \(r_\phi(\theta^k, x)\). Eventually, given a batch of \(N \geq 2K\) pairs \((\theta_i, x_i)\), the module returns

\[l = \frac{1}{N} \sum_{i = 1}^N \frac{\gamma}{\gamma + 1} \ell(d_\phi(\Theta_i, x_i)) + \frac{1}{\gamma + 1} \ell(1 - d_\phi(\Theta_{i+K}, x_i))\]

where \(\ell(p) = -\log p\) is the negative log-likelihood and \(\Theta_i = \{\theta_i, \dots, \theta_{i+K-1}\}\).

Note

The quantity \(d_\phi(\Theta, x)\) corresponds to \(q_\phi(y \neq 0 | \Theta, x)\) or \(1 - q_\phi(y = 0 | \Theta, x)\) in the notations of Miller et al. (2022).

References

Contrastive Neural Ratio Estimation (Miller et al., 2022)

https://arxiv.org/abs/2210.06170

Parameters

estimator (Module) – A log-ratio network \(\log r_\phi(\theta, x)\).
cardinality (int) – The cardinality \(K\) of \(\Theta\).
gamma (float) – The odds ratio \(\gamma\).

forward(theta, x)#

Parameters

theta (Tensor) – The parameters \(\theta\), with shape \((N, D)\).
x (Tensor) – The observation \(x\), with shape \((N, L)\).

Returns

The scalar loss \(l\).

Return type

Tensor

class lampe.inference.BCNRELoss(estimator, cardinality=2, gamma=1.0, lmbda=100.0)#

Creates a module that calculates the balanced cross-entropy loss for a balanced CNRE (BCNRE) network.

Given a batch of \(N \geq 2K\) pairs \((\theta_i, x_i)\), the module returns

\[\begin{split}l & = \frac{1}{N} \sum_{i = 1}^N \frac{\gamma}{\gamma + 1} \ell(d_\phi(\Theta_i, x_i)) + \frac{1}{\gamma + 1} \ell(1 - d_\phi(\Theta_{i+K}, x_i)) \\ & + \lambda \left(1 - \frac{1}{N} \sum_{i = 1}^N d_\phi(\Theta_i, x_i) + d_\phi(\Theta_{i+K}, x_i) \right)^2\end{split}\]

where \(\ell(p) = -\log p\) is the negative log-likelihood and \(\Theta_i = \{\theta_i, \dots, \theta_{i+K-1}\}\).

References

Balancing Simulation-based Inference for Conservative Posteriors (Delaunoy et al., 2023)

https://arxiv.org/abs/2304.10978

Parameters

estimator (Module) – A log-ratio network \(\log r_\phi(\theta, x)\).
cardinality (int) – The cardinality \(K\) of \(\Theta\).
gamma (float) – The odds ratio \(\gamma\).
lmbda (float) – The weight \(\lambda\) controlling the strength of the balancing condition.

forward(theta, x)#

Parameters

theta (Tensor) – The parameters \(\theta\), with shape \((N, D)\).
x (Tensor) – The observation \(x\), with shape \((N, L)\).

Returns

The scalar loss \(l\).

Return type

Tensor

class lampe.inference.AMNRE(theta_dim, x_dim, *args, **kwargs)#

Creates an arbitrary marginal neural ratio estimation (AMNRE) network.

The principle of AMNRE is to introduce, as input to the classifier, a binary mask \(b \in \{0, 1\}^D\) indicating a subset of parameters \(\theta_b = (\theta_i: b_i = 1)\) of interest. Intuitively, this allows the classifier to distinguish subspaces and to learn a different ratio for each of them. Formally, the classifier network takes the form \(d_\phi(\theta_b, x, b)\) and the optimization problem becomes

\[\arg\min_\phi \frac{1}{2} \mathbb{E}_{p(\theta, x) P(b)} \big[ \ell(d_\phi(\theta_b, x, b)) \big] + \frac{1}{2} \mathbb{E}_{p(\theta)p(x) P(b)} \big[ \ell(1 - d_\phi(\theta_b, x, b)) \big],\]

where \(P(b)\) is a binary mask distribution. In this context, the Bayes optimal classifier is

\[d(\theta_b, x, b) = \frac{p(\theta_b, x)}{p(\theta_b, x) + p(\theta_b) p(x)} = \frac{r(\theta_b, x)}{1 + r(\theta_b, x)} .\]

Therefore, a classifier network trained for AMNRE gives access to an estimator \(\log r_\phi(\theta_b, x, b)\) of all marginal LTE log-ratios \(\log r(\theta_b, x)\).

References

Arbitrary Marginal Neural Ratio Estimation for Simulation-based Inference (Rozet et al., 2021)

https://arxiv.org/abs/2110.00449

Parameters

theta_dim (int) – The dimensionality \(D\) of the parameter space.
x_dim (int) – The dimensionality \(L\) of the observation space.
args – Positional arguments passed to NRE.
kwargs – Keyword arguments passed to NRE.

forward(theta, x, b)#

Parameters

theta (Tensor) – The parameters \(\theta\), with shape \((*, D)\), or a subset \(\theta_b\).
x (Tensor) – The observation \(x\), with shape \((*, L)\).
b (BoolTensor) – A binary mask \(b\), with shape \((*, D)\).

Returns

The log-ratio \(\log r_\phi(\theta_b, x, b)\), with shape \((*,)\).

Return type

Tensor

class lampe.inference.AMNRELoss(estimator, mask_dist)#

Creates a module that calculates the cross-entropy loss for an AMNRE network.

Given a batch of \(N \geq 2\) pairs \((\theta_i, x_i)\), the module returns

\[l = \frac{1}{2N} \sum_{i = 1}^N \ell(d_\phi(\theta_i \odot b_i, x_i, b_i)) + \ell(1 - d_\phi(\theta_{i+1} \odot b_i, x_i, b_i))\]

where the binary masks \(b_i\) are sampled from a distribution \(P(b)\).

Parameters

estimator (Module) – A log-ratio network \(\log r_\phi(\theta, x, b)\).
mask_dist (Distribution) – A binary mask distribution \(P(b)\).

forward(theta, x)#

Parameters

theta (Tensor) – The parameters \(\theta\), with shape \((N, D)\).
x (Tensor) – The observation \(x\), with shape \((N, L)\).

Returns

The scalar loss \(l\).

Return type

Tensor

class lampe.inference.NPE(theta_dim, x_dim, build=<class 'zuko.flows.autoregressive.MAF'>, **kwargs)#

Creates a neural posterior estimation (NPE) normalizing flow.

The principle of neural posterior estimation is to train a parametric conditional distribution \(p_\phi(\theta | x)\) to approximate the posterior distribution \(p(\theta | x)\). The optimization problem is to minimize the expected Kullback-Leibler (KL) divergence between the two distributions for all observations \(x \sim p(x)\), that is,

\[\begin{split}\arg\min_\phi & ~ \mathbb{E}_{p(x)} \Big[ \text{KL} \big( p(\theta|x) \parallel p_\phi(\theta | x) \big) \Big] \\ = \arg\min_\phi & ~ \mathbb{E}_{p(x)} \, \mathbb{E}_{p(\theta | x)} \left[ \log \frac{p(\theta | x)}{p_\phi(\theta | x)} \right] \\ = \arg\min_\phi & ~ \mathbb{E}_{p(\theta, x)} \big[ -\log p_\phi(\theta | x) \big] .\end{split}\]

Normalizing flows are typically used for \(p_\phi(\theta | x)\) as they are differentiable parametric distributions enabling gradient-based optimization techniques.

Wikipedia

https://wikipedia.org/wiki/Kullback-Leibler_divergence

Parameters

theta_dim (int) – The dimensionality \(D\) of the parameter space.
x_dim (int) – The dimensionality \(L\) of the observation space.
build (Callable[[int, int], Flow]) – The flow constructor (e.g. zuko.flows.spline.NSF). It takes the number of sample and context features as positional arguments.
kwargs – Keyword arguments passed to the constructor.

forward(theta, x)#

Parameters

theta (Tensor) – The parameters \(\theta\), with shape \((*, D)\).
x (Tensor) – The observation \(x\), with shape \((*, L)\).

Returns

The log-density \(\log p_\phi(\theta | x)\), with shape \((*,)\).

Return type

Tensor

class lampe.inference.NPELoss(estimator)#

Creates a module that calculates the negative log-likelihood loss for a NPE normalizing flow.

Given a batch of \(N\) pairs \((\theta_i, x_i)\), the module returns

\[l = \frac{1}{N} \sum_{i = 1}^N -\log p_\phi(\theta_i | x_i) .\]

Parameters: estimator (Module) – A normalizing flow \(p_\phi(\theta | x)\).

forward(theta, x)#

Parameters

theta (Tensor) – The parameters \(\theta\), with shape \((N, D)\).
x (Tensor) – The observation \(x\), with shape \((N, L)\).

Returns

The scalar loss \(l\).

Return type

Tensor

class lampe.inference.FMPE(theta_dim, x_dim, freqs=3, build=<class 'zuko.nn.MLP'>, **kwargs)#

Creates a flow matching posterior estimation (FMPE) network.

The principle of FMPE is to train a regression network \(v_\phi(\theta, x, t)\) to approximate a vector field inducing a time-continuous normalizing flow between the posterior distribution \(p(\theta | x)\) and a standard Gaussian distribution \(\mathcal{N}(0, I)\).

After training, the normalizing flow \(p_\phi(\theta | x)\) induced by \(v_\phi(\theta, x, t)\) is used to evaluate the posterior density or generate samples.

References

Flow Matching for Generative Modeling (Lipman et al., 2023)

https://arxiv.org/abs/2210.02747

Flow Matching for Scalable Simulation-Based Inference (Dax et al., 2023)

https://arxiv.org/abs/2305.17161

Parameters

theta_dim (int) – The dimensionality \(D\) of the parameter space.
x_dim (int) – The dimensionality \(L\) of the observation space.
freqs (int) – The number of time embedding frequencies.
build (Callable[[int, int], Module]) – The network constructor (e.g. lampe.nn.ResMLP). It takes the number of input and output features as positional arguments.
kwargs – Keyword arguments passed to the constructor.

forward(theta, x, t)#

Parameters

theta (Tensor) – The parameters \(\theta\), with shape \((*, D)\).
x (Tensor) – The observation \(x\), with shape \((*, L)\).
t (Tensor) – The time \(t\), with shape \((*,).\)

Returns

The vector field \(v_\phi(\theta, x, t)\), with shape \((*, D)\).

Return type

Tensor

flow(x)#

Parameters: x (Tensor) – The observation \(x\), with shape \((*, L)\).
Returns: The normalizing flow \(p_\phi(\theta | x)\).
Return type: Distribution

class lampe.inference.FMPELoss(estimator, eta=0.001)#

Creates a module that calculates the flow matching loss for a FMPE regressor.

Given a batch of \(N\) pairs \((\theta_i, x_i)\), the module returns

\[l = \frac{1}{N} \sum_{i = 1}^N\| v_\phi((1 - t_i) \theta_i + (t_i + \eta) \epsilon_i, x_i, t_i) - (\epsilon_i - \theta_i) \|_2^2\]

where \(t_i \sim \mathcal{U}(0, 1)\) and \(\epsilon_i \sim \mathcal{N}(0, I)\).

Parameters: estimator (Module) – A regression network \(v_\phi(\theta, x, t)\).

forward(theta, x)#

Parameters

theta (Tensor) – The parameters \(\theta\), with shape \((N, D)\).
x (Tensor) – The observation \(x\), with shape \((N, L)\).

Returns

The scalar loss \(l\).

Return type

Tensor

class lampe.inference.MetropolisHastings(x_0, f=None, log_f=None, sigma=1.0)#

Creates a batched Metropolis-Hastings sampler.

Metropolis-Hastings is a Markov chain Monte Carlo (MCMC) sampling algorithm used to sample from intractable distributions \(p(x)\) whose density is proportional to a tractable function \(f(x)\), with \(x \in \mathcal{X}\). The algorithm consists in repeating the following routine for \(t = 1\) to \(T\), where \(x_0\) is the initial sample and \(q(x' | x)\) is a pre-defined transition distribution.

\[\begin{split}1. ~ & x' \sim q(x' | x_{t-1}) \\ 2. ~ & \alpha \gets \frac{f(x')}{f(x_{t-1})} \frac{q(x_{t-1} | x')}{q(x' | x_{t-1})} \\ 3. ~ & u \sim \mathcal{U}(0, 1) \\ 4. ~ & x_t \gets \begin{cases} x' & \text{if } u \leq \alpha \\ x_{t-1} & \text{otherwise} \end{cases}\end{split}\]

Asymptotically, i.e. when \(T \to \infty\), the distribution of samples \(x_t\) is guaranteed to converge towards \(p(x)\). In this implementation, a Gaussian transition \(q(x' | x) = \mathcal{N}(x'; x, \Sigma)\) is used, which can be modified by sub-classing MetropolisHastings.

Wikipedia

https://wikipedia.org/wiki/Metropolis-Hastings_algorithm

Parameters

x_0 (Tensor) – A batch of initial points \(x_0\), with shape \((*, L)\).
f (Callable[[Tensor], Tensor]) – A function \(f(x)\) proportional to a density function \(p(x)\).
log_f (Callable[[Tensor], Tensor]) – The logarithm \(\log f(x)\) of a function proportional to \(p(x)\).
sigma (Union[float, Tensor]) – The standard deviation of the Gaussian transition. Either a scalar or a vector.

Example

>>> x_0 = torch.randn(128, 7)
>>> log_f = lambda x: -(x**2).sum(dim=-1) / 2
>>> sampler = MetropolisHastings(x_0, log_f=log_f, sigma=0.5)
>>> samples = [x for x in sampler(256, burn=128, step=4)]
>>> samples = torch.stack(samples)
>>> samples.shape
torch.Size([32, 128, 7])