# Estimating priors

by: Emil Larsen and Chris Cave

(Feyn version 2.1.1 or newer)

Before running the `QLattice`

we can compute an initial map of prior probabilities (prior in short) of the input variables.
A prior probability of an input `x`

in the context of the `QLattice`

denotes our prior belief of the importance of `x`

in predicting the output before we run the training loop.

By default the prior probabilities are the same for all inputs. You can estimate the priors using the function `feyn.tools.estimate_priors`

.
This provides an efficient means of initial feature selection, which in the vast majority of cases will increase both predictive performance and reduce time to convergence of the `QLattice`

. This works particularly well for wide data sets with many inputs.

An example is shown below:

```
import feyn
from feyn.datasets import make_classification
train, test = make_classification(random_state=42)
priors = feyn.tools.estimate_priors(train, output_name, floor=0.1)
```

The value of the priors variable is then the following map of inputs to values in the range [0, 1]:

```
{
'x0': 0.99,
'x1': 0.97,
'x2': 0.98,
'x3': 0.88,
'x4': 0.83,
'x5': 0.94,
'x6': 0.96,
'x7': 0.81,
'x8': 0.84,
'x9': 0.87,
'x10': 0.89,
'x11': 1.0,
'x12': 0.86,
'x13': 0.95,
'x14': 0.85,
'x15': 0.9299999999999999,
'x16': 0.92,
'x17': 0.8200000000000001,
'x18': 0.91,
'x19': 0.9
}
```

Higher values means sampling the corresponding input is more likely at the beginning of training of the `QLattice`

. During training of the `QLattice`

the probability distribution for sampling the different input variables will change as usual.

Below we summarise the parameters of `feyn.tools.estimate_priors`

## data

The data the prior probabilities will be computed on.

Note: Make sure to only compute the priors based on the train set and not the entire data set. Otherwise information will leak from the test set and will bias the test error.

## output_name

The name of the output (target) variable in the data set.

## floor

Threshold of the prior probability values for exluding an input from the set of input features. If the prior probability of an input is below `floor`

it will be clamped to the floor value. The default is 0.1, meaning the minimum probability returned for an input is 0.1. If you allow this floor to be 0, inputs with 0 probability will not appear in models sampled by the `QLattice`

during training (assuming the `QLattice`

is updated with the priors before sampling; see Updating priors.