# Estimating priors

by: Kevin Broløs and Emil Larsen

(Feyn version 3.0 or newer)

Let's show you how to estimate and update the priors before sampling models.

When using Auto Run, the default behaviour is to estimate the priors by using the function `feyn.tools.estimate_priors`

.

This provides an efficient means of initial feature selection, and will increase predictive performance and reduce time to convergence of the `QLattice`

in a majority of cases. It works particularly well for wide data sets with many inputs.

## Example

Here's an example on how to use `estimate_priors`

before manually sampling models from a `QLattice`

to get a similar effect:

```
import feyn
from feyn.datasets import make_classification
ql = feyn.QLattice()
train, test = make_classification(random_state=42)
output_name = "y"
priors = feyn.tools.estimate_priors(train, output_name, floor=0.1)
ql.update_priors(priors)
new_sample = ql.sample_models(
train,
output_name,
'classification'
)
```

*Note that while this example stops at sampling, you only need to estimate the priors once if you're running a sample-fit-update loop as described in Using the primitives.*

If we print out the value of the `priors`

variable, we get the following map of **input names** with their **relative weights** in the range [0, 1]:

```
{
'x0': 0.99,
'x1': 0.97,
'x2': 0.98,
'x3': 0.88,
'x4': 0.83,
'x5': 0.94,
'x6': 0.96,
'x7': 0.81,
'x8': 0.84,
'x9': 0.87,
'x10': 0.89,
'x11': 1.0,
'x12': 0.86,
'x13': 0.95,
'x14': 0.85,
'x15': 0.9299999999999999,
'x16': 0.92,
'x17': 0.8200000000000001,
'x18': 0.91,
'x19': 0.9
}
```

Higher values increases the initial likelihoods of sampling the corresponding input from the `QLattice`

. During training of the `QLattice`

the probability of sampling the different input variables will change as usual.

You don't have to use this particular prior estimation function. As shown in Updating priors, we can supply any priors we like - or none at all.

`feyn.tools.estimate_priors`

Parameters of ### data

The data the prior probabilities will be computed on.

Note: Make sure to only compute the priors based on the train set and not the entire data set. Otherwise information will leak from the test set and will bias the test error.

### output_name

The name of the output (target) variable in the data set.

### floor

Default: 0.1.

The minimum value permitted for any prior computed. If the computed prior of an input is below the `floor`

value, it will be clamped to the floor value.

Note: If you allow this floor to be 0, inputs with 0 probability will not appear in models sampled by the

`QLattice`

during training after Updating priors.