Learning
In this section we provide few learning scenarios for circuits. In general, learning tasks for PCs can be separted into two categories: parameter learning and structure learning.
Learn a Circuit
You can use learn_circuit
to learn a probabilistic circuit from the data (both parameter and structure learning).
using LogicCircuits
using ProbabilisticCircuits
train_x, valid_x, test_x = twenty_datasets("nltcs")
pc = learn_circuit(train_x; maxiter=100);
"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters. " *
"Train log-likelihood is $(log_likelihood_avg(pc, train_x))"
"PC: 786 nodes, 545 parameters. Train log-likelihood is -6.20044908456545"
Learning a circuit from missing data
You can use learn_circuit_miss
to learn a probabilistic circuit from missing data, i.e. some feature could be missing for each data point.
train_x_miss = make_missing_mcar(train_x; keep_prob=0.9)
pc = learn_circuit_miss(train_x_miss; maxiter=100);
"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters. " *
"Train marginal-log-likelihood is $(marginal_log_likelihood_avg(pc, train_x))"
"PC: 315 nodes, 238 parameters. Train marginal-log-likelihood is -6.512186"
Learn a mixture of circuits
We also support learning mixture of circuits using the Strudel algorithm (learn_strudel
).
using LogicCircuits
using ProbabilisticCircuits
using Statistics
train_x, valid_x, test_x = twenty_datasets("nltcs")
spc, component_weights, lls = learn_strudel(train_x; num_mix = 10, init_maxiter = 20, em_maxiter = 100);
"SPC: $(num_nodes(spc)) nodes, $(num_parameters(spc)) parameters. " *
"Train log-likelihood is $(mean(lls))"
"SPC: 305 nodes, 1960 parameters. Train log-likelihood is -5.9778812998785735"
Learn a circuit from logical constraints and data
There are several ways to learn a probabilistic circuit consistent with logical constraints. Juice currently supports the following:
- Compilation from an SDD;
- Compilation from a BDD;
- Relaxation through
sample_psdd
.
Compilation from an SDD
A circuit (more specifically a PSDD) can be easily constructed from an SDD by simply calling the compile
function.
Let's assume we have the following CNF
as a .cnf
file:
/tmp/example.cnf
---
c Encodes the following CNF: ϕ = (1 ∨ ¬2) ∧ (3 ∨ ¬4) ∧ (1 ∨ ¬4)
c
p cnf 4 3
1 -2 0
3 -4 0
1 -4 0
First we construct an SDD from the CNF. Here we sample a random vtree as an example (you might want to learn it from data instead with learn_vtree
).
using LogicCircuits
using ProbabilisticCircuits
using DataFrames
n = 4 # number of variables
V = Vtree(n, :random)
sdd = compile(SddMgr(V), read("/tmp/example.cnf", LogicCircuit))
pc = compile(StructProbCircuit, sdd)
StructSumNode(7699004617873649383)
Let's check its support.
# Matrix with all possible worlds.
M = BitMatrix(undef, 2^n, n)
for i in 1:size(M, 1) M[i,:] .= [c == '0' ? false : true for c in reverse(bitstring(i-1)[end-n-1:end])] end
display(M)
# Evaluate SDD.
display(sdd.(eachrow(M)))
And now the probabilities:
# Evaluate the PSDD support.
EVI(pc, DataFrame(M))
Compilation from a BDD
Compiing from a BDD is straightforward. Let's first create a BDD from the same previous constraints using LogicCircuits
.
using LogicCircuits
ϕ = (1 ∨ ¬2) ∧ (3 ∨ ¬4) ∧ (1 ∨ ¬4)
@ (index=1, id=6) | - (index=2, id=5) | | - (index=4, id=3) | | | - (value=true, id=2) | | | + (value=false, id=1) | | + (value=false, id=1) | + (index=3, id=4) | | - (index=4, id=3) | | | - (value=true, id=2) | | | + (value=false, id=1) | | + (value=true, id=2)
Now we either compile the PSDD directly from the BDD and give it random weights:
using ProbabilisticCircuits, DataFrames
# Get all possible instances with BDD.all_valuations.
M = all_valuations(collect(1:n))
M_D = DataFrame(M)
pc = generate_from_bdd(ϕ, 4)
EVI(pc, M_D)
Or compile from a BDD and learn weights from data:
# Retrieve only possible worlds.
W = M[findall(ϕ.(eachrow(M))),:]
# Assign random probabilities for each world in W.
R = rand(1:20, size(W, 1))
# Construct a dataset that maps the distribution of R (world W[i] repeats R[i] times).
D = DataFrame(vcat([repeat(W[i,:], 1, R[i])' for i ∈ 1:size(W, 1)]...))
pc = learn_bdd(ϕ, D; pseudocount = 0.0)
EVI(pc, M_D)
Since BDDs are just right-linear vtree PSDDs, this "compilation" is merely a conversion from BDD syntax to PC syntax, attributing some weight to edges.
Sampling a circuit from a relaxation of the constraints
The two previous approaches are effective, but not always adequate. For instance, suppose our data consists of 6 variables: $a$, $b$, $c$, $d$, $e$ and $f$, where only $a$, $b$, $c$ and $d$ are constrained (by $\phi$), and the rest are free. Had we compiled $\phi$ from either an SDD or BDD, we'd end up with trivial structures for free variables. For instance, calling learn_bdd
(or generate_from_bdd
) with more variables than the size of the BDD's scope would result in a fully factorized distribution over the free variables.
To address these issues, we might want to generate a circuit from both free and constrained variables with sample_psdd
. Unfortunately, to keep the circuit tractable, sample_psdd
only provides a relaxation of the constraints.
Let's first encode our constraints as a BDD just like our previous example and make up some random data.
using LogicCircuits, DataFrames
n = 6
ϕ = (1 ∨ ¬2) ∧ (3 ∨ ¬4) ∧ (1 ∨ ¬4)
M = all_valuations(collect(1:n))
M_D = DataFrame(M)
W = M[findall(ϕ.(eachrow(M))),:]
R = rand(1:20, size(W, 1))
D = DataFrame(vcat([repeat(W[i,:], 1, R[i])' for i ∈ 1:size(W, 1)]...))
Now we can sample circuits from $\phi$ and data $D$.
using ProbabilisticCircuits
using LogicCircuits: Vtree
pc = sample_psdd(ϕ, Vtree(n, :random), 16, D)
EVI(pc, M_D)
The third argument passed to sample_psdd
indicates an upper bound on the number of children whose parents are sum nodes. The higher this bound, the more consistent with $\phi$.
In situations where background knowledge is not available, we may pass $\top$ to sample_psdd
to only learn from data.
pc = sample_psdd(⊤, Vtree(n, :random), 16, D)
EVI(pc, M_D)
Learning an ensemble of circuits
learn_strudel
let's us learn an ensemble of circuits that share the same structure. For learning ensembles whose components have different structures, we have to use Ensemble
.
Ensemble of sample_psdd
s
We can learn an ensemble of random circuits through ensemble_sample_psdd
.
E = ensemble_sample_psdd(10, ϕ, 16, D; strategy = :em)
EVI(E, M_D)
Here we used EM to learn the weights of the ensemble. Alternatives are likelihood weighting (:likelihood
), uniform weights (:uniform
) or stacking (:stacking
).
Misc Options
In this sections, we provide options to have more control in learning circuits. For example, what if we only want to do parameter learning.
Parameter Learning
Given a fixed structure for the PC, the goal of parameter learning is to estimate the parameters so that likelihood is maximized.
First, initliaze PC structure with a balanced vtree represneting a fully factorized distribution:
v = Vtree(num_features(train_x), :balanced)
pc = fully_factorized_circuit(StructProbCircuit, v);
"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters." *
"Train log-likelihood is $(log_likelihood_avg(pc, train_x))"
"PC: 79 nodes, 48 parameters.Train log-likelihood is -11.090354888959121"
No parmater learning is done yet, now let's, do maximum likelihood estimatation (MLE) using estimate_parameters
:
estimate_parameters!(pc, train_x; pseudocount=1.0);
"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters." *
"Train log-likelihood is $(log_likelihood_avg(pc, train_x))"
"PC: 79 nodes, 48 parameters.Train log-likelihood is -9.270330518351882"
As we see the likelihood improved, however we are still using a fully factorized distribution. There is room for improvement. For example, we can choose initial structure based on Chow-Liu Trees.
pc, vtree = learn_chow_liu_tree_circuit(train_x)
"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters." *
"Train log-likelihood is $(log_likelihood_avg(pc, train_x))"
"PC: 121 nodes, 74 parameters.Train log-likelihood is -6.7600561877106955"
Structure Learning
There are several different approaches in structure learning. Currently we support the following approach:
- Choose an initial structure and learn parameters
- Perform Greedy search for a bigger and better structure by doing operations such as split and clone.
- Repeat step 2 until satisfied or time limit reached
We start with the Chow-Liu structure we learned in last section, and run few structure learning iterations (20):
pc, vtree = learn_chow_liu_tree_circuit(train_x)
loss(circuit) = ProbabilisticCircuits.heuristic_loss(circuit, train_x)
pc = struct_learn(pc;
primitives=[split_step],
kwargs=Dict(split_step=>(loss=loss,)),
maxiter=20)
estimate_parameters!(pc, train_x; pseudocount=1.0)
"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters. " *
"Training set log-likelihood is $(log_likelihood_avg(pc, train_x))"
"PC: 305 nodes, 196 parameters. Training set log-likelihood is -6.383045029183077"