Validate data · Feyn Documentation

by: Kevin Broløs
(Feyn version 2.0.7 or newer)

validate_data is a function that helps discover the few common data errors that might give unwanted effects with feyn. We advise running this once after loading in your data, to ensure that your data is in good enough condition.

In order to best validate your data, you need to specify the kind of problem you intend to solve, the output column as well as the stypes that you'll use for sample_models, if any of them are categorical.

Example

from feyn.datasets import make_classification
from feyn import validate_data

train, test = make_classification()

validate_data(data=train, kind='classification', output_name='y', stypes={})

Here's an example that doesn't validate, because we're using a continuous numerical output to do a classification:

from feyn.datasets import make_regression
from feyn import validate_data

train, test = make_regression()

try:
    validate_data(data=train, kind='classification', output_name='y', stypes={})
except ValueError as e:
    print(e)

y must be an iterable of booleans or 0s and 1s

In the examples we run it for the training data, but we recommend running it for the full dataset.

validate_data will raise a ValueError in the following cases:

If the output column does not consist of only numerical values for a regression case.
If the output column does not consist boolean-like values for a classification case.
If any of the columns are object types, but have not been declared as categorical in stypes.
If columns contain NaN values, and are not declared as categorical in stypes.
- Note: categoricals support NaN values by assigning them their own weights, so we allow this. You should still consider if that's the behaviour what you want, and handle it yourself if you don't.

Example

Subscribe to get news about Feyn and the QLattice.