Estimating priors
by: Kevin Broløs and Emil Larsen
(Feyn version 3.0 or newer)
Let's show you how to estimate and update the priors before sampling models.
When using Auto Run, the default behaviour is to estimate the priors by using the function feyn.tools.estimate_priors
.
This provides an efficient means of initial feature selection, and will increase predictive performance and reduce time to convergence of the QLattice
in a majority of cases. It works particularly well for wide data sets with many inputs.
Example
Here's an example on how to use estimate_priors
before manually sampling models from a QLattice
to get a similar effect:
import feyn
from feyn.datasets import make_classification
ql = feyn.QLattice()
train, test = make_classification(random_state=42)
output_name = "y"
priors = feyn.tools.estimate_priors(train, output_name, floor=0.1)
ql.update_priors(priors)
new_sample = ql.sample_models(
train,
output_name,
'classification'
)
Note that while this example stops at sampling, you only need to estimate the priors once if you're running a sample-fit-update loop as described in Using the primitives.
If we print out the value of the priors
variable, we get the following map of input names with their relative weights in the range [0, 1]:
{
'x0': 0.99,
'x1': 0.97,
'x2': 0.98,
'x3': 0.88,
'x4': 0.83,
'x5': 0.94,
'x6': 0.96,
'x7': 0.81,
'x8': 0.84,
'x9': 0.87,
'x10': 0.89,
'x11': 1.0,
'x12': 0.86,
'x13': 0.95,
'x14': 0.85,
'x15': 0.9299999999999999,
'x16': 0.92,
'x17': 0.8200000000000001,
'x18': 0.91,
'x19': 0.9
}
Higher values increases the initial likelihoods of sampling the corresponding input from the QLattice
. During training of the QLattice
the probability of sampling the different input variables will change as usual.
You don't have to use this particular prior estimation function. As shown in Updating priors, we can supply any priors we like - or none at all.
feyn.tools.estimate_priors
Parameters of data
The data the prior probabilities will be computed on.
Note: Make sure to only compute the priors based on the train set and not the entire data set. Otherwise information will leak from the test set and will bias the test error.
output_name
The name of the output (target) variable in the data set.
floor
Default: 0.1.
The minimum value permitted for any prior computed. If the computed prior of an input is below the floor
value, it will be clamped to the floor value.
Note: If you allow this floor to be 0, inputs with 0 probability will not appear in models sampled by the
QLattice
during training after Updating priors.