Queries

In this section, we go over most common probabilistic reasoning tasks, and provide code snippets to compute those queries.

Setup

First, we load some pretrained PC, and the corresponding data.

pc = zoo_psdd("plants.psdd")
data, _, _ = twenty_datasets("plants");
println("circuit with $(num_nodes(pc)) nodes and $(num_parameters(pc)) parameters.")
println("dataset with $(num_features(data)) features and $(num_examples(data)) examples.")

circuit with 153021 nodes and 91380 parameters.
dataset with 69 features and 17412 examples.

Full Evidence (EVI)

EVI refers to computing the probability when full evidence is given, i.e. when $x$ is fully observed, the output is $p(x)$. We can use EVI method to compute $\log{p(x)}$:

probs = EVI(pc, data);
probs[1:3]

3-element Vector{Float64}:
  -7.533388371782998
 -16.534175321651873
 -27.80940132993066

Computing the EVI of a mixture of circuits works the same way. You may either pass weights for the weighted mixture probability, or pass a component index to individually evaluate each component.

mix, mix_weights, _ = learn_strudel(data; num_mix = 10, init_maxiter = 10, em_maxiter = 100)
# This computes the weighted probability
probs = EVI(mix, data, mix_weights);
# Alternatively, we may want to compute the probability of a single component
c_prob = EVI(mix, data; component_idx = 1);
c_prob[1:3]

3-element Vector{Float64}:
 -16.41705785046347
 -68.71509574089467
 -38.71104227796572

In this case we have some missing values. Let $x^o$ denote the observed features, and $x^m$ the missing features. We would like to compute $p(x^o)$ which is defined as $p(x^o) = \sum_{x^m} p(x^o, x^m)$. Of course, computing this directly by summing over all possible ways to fill the missing values is not tractable.

The good news is that given a smooth and decomposable PC, the marginal can be computed exactly and in linear time to the size of the PC.

First, we randomly make some features go missing (you can also use make_missing_mcar from LogicCircuits library):

using DataFrames
using Tables
function make_missing(d::DataFrame; keep_prob=0.8)
    m = missings(Bool, num_examples(d), num_features(d))
    flag = rand(num_examples(d), num_features(d)) .<= keep_prob
    m[flag] .= Tables.matrix(d)[flag]
    DataFrame(m, :auto)
end;
data_miss = make_missing(data[1:1000,:]);

Now, we can use MAR to compute the marginal queries:

probs = MAR(pc, data_miss);
probs[1:3]

3-element Vector{Float32}:
  -7.53333
 -13.310469
 -20.256458

Note that MAR can also be used to compute probabilisties even if all data is observed, in fact it should give the same results as EVI. However, if we know all features are observed, we suggest using EVI as its faster in general.

probs_mar = MAR(pc, data);
probs_evi = EVI(pc, data);

probs_mar ≈ probs_evi

true

Just like EVI, MAR works the same way for mixtures.

# Full weighted marginal probability
probs_mar = MAR(mix, data, mix_weights);
# Individual component's marginal probability
c_probs_mar = MAR(mix, data; component_idx = 1);
c_probs_mar[1:3]

3-element Vector{Float32}:
 -16.417055
 -68.715096
 -38.711044

Conditionals (CON)

In this case, given observed features $x^o$, we would like to compute $p(Q \mid x^o)$, where $Q$ is a subset of features disjoint with $x^o$. We can use Bayes rule to compute conditionals as two seperate MAR queries as follows:

\[p(q \mid x^o) = \cfrac{p(q, x^o)}{p(x^o)}\]

Currently, this has to be done manually by the user. We plan to add a simple API for this case in the future.

Maximum a posteriori (MAP, MPE)

In this case, given the observed features $x^o$ the goal is to fill out the missing features in a way that $p(x^m, x^o)$ is maximized.

We can use the MAP method to compute MAP, which outputs the states that maximize the probability and the log-likelihoods of each state.

data_miss = make_missing(data,keep_prob=0.5);
states, probs = MAP(pc, data_miss);
probs[1:3]

3-element Vector{Float32}:
  -6.598743
 -15.989449
 -21.956953

Sampling

We can also sample from the distrubtion $p(x)$ defined by a Probabilistic Circuit. You can use sample to achieve this task.

samples, _ = sample(pc, 100);
size(samples)

(100, 69)

Additionally, we can do conditional samples $x \sim p(x \mid x^o)$, where $x^o$ are the observed features ($x^o \subseteq x$), and could be any arbitrary subset of features.

#3 random evidences for the examples
evidence = DataFrame(rand( (missing,true,false), (2, num_variables(pc))), :auto)

samples, _ = sample(pc, 3, evidence);
size(samples)

(3, 2, 69)

Expected Prediction (EXP)

Expected Prediction (EXP) is the task of taking expectation of a discrimintative model w.r.t a generative model conditioned on evidemce (subset of features observed).

$\mathbb{E}_{x^m \sim p(x^m \mid x^o)} [ f(x^o x^m) ]$

In the case where $f$ and $p$ are circuit, and some structural constrains for the pair, we can do this expectation and higher moments tractably. You can use Expectation and Moment to compute the expectations.

using DiscriminativeCircuits
using DataFrames

pc = zoo_psdd("insurance.psdd")
rc = zoo_dc("insurance.circuit")

# Using samples from circuit for the example; replace with real data
data, _ = sample(pc, 10);
data = make_missing(DataFrame(data, :auto));

exps, exp_cache = Expectation(pc, rc, data)

exps[1:3]

3-element Vector{Float32}:
 12886.046
 13234.583
 10953.938

second_moments, moment_cache = Moment(pc, rc, data, 2);
exps[1:3]

3-element Vector{Float32}:
 12886.046
 13234.583
 10953.938

stds = sqrt.( second_moments - exps.^2 );
stds[1:3]

3-element Vector{Float32}:
  278.6539
 1155.2455
 3938.1763