Feyn Documentation

Feyn Documentation

  • Learn
  • Guides
  • Tutorials
  • API Reference
  • FAQ

›Primitive Operations

Getting Started

  • Quick start
  • Using Feyn
  • Installation
  • What is the QLattice?

Essentials

  • Auto Run
  • Summary plot
  • Plot response
  • Splitting a dataset
  • Seeding a QLattice
  • Predicting with a model
  • Saving and loading models
  • Categorical features

Evaluate Regressors

  • Regression plot
  • Residuals plot

Evaluate Classifiers

  • ROC curve
  • Confusion matrix
  • Plot probability scores

Understand Your Models

  • Plot response 1D
  • Plot response 2D
  • Model signal
  • Segmented loss
  • Interactive flow

Primitive Operations

  • Using the primitives
  • Updating priors
  • Sample models
  • Fitting models
  • Pruning models
  • Visualise a model
  • Diverse models
  • Updating a QLattice
  • Validate data
  • Semantic types

Advanced

  • Converting a model to SymPy
  • Logging in Feyn
  • Setting themes
  • Saving a graph as an image
  • Using the query language
  • Estimating priors
  • Filtering models
  • Model parameters
  • Model complexity

Privacy & Commercial

  • Privacy
  • Community edition
  • Commercial use
  • Transition to Feyn 3.0

Semantic types

by: Kevin Broløs
(Feyn version 3.4.0 or newer)


There are three types of input data that Models can interpret:

  • numerical, which includes:
    • floating point numbers
    • integers
  • categorical, which includes:
    • strings
  • boolean, represented by
    • a discrete number 0 or 1
    • True or False

The Model handles transformation of inputs, and it uses the stype declarations to decide how. Numerical values learn a linear rescaling and categorical values get assigned individual weights. boolean inputs can be assigned to either numerical or categorical, and the only difference will be whether or not it can handle string representations of booleans (like yes and no). Generally speaking, numerical inputs are a bit more efficient and the resulting equations are arguably simpler, so we recommend using that over the other.

The categorical stype helps maintain a simple and interpretable model by avoiding dimensional expansion like you would see in one-hot or dummy encoding. You can read more about how we treat Categorical features.

Stypes in auto_run

When you use auto_run, the stypes will be automatically inferred from your data unless you specify them manually. It will also produce warnings if some columns appear to be unsuitable for training, or have issues like high cardinality (many unique values compared to the size of the dataset).

We try to be clever and efficient, but all data sets are different, so you can supply your own stypes to bypass this extra step entirely. You can also call the function feyn.tools.infer_stypes on your own before running auto_run to use as a starting point and just change the types you are not satisfied with.

Example

This simplifies the preprocessing task that would fall on the data scientist. This means that you should not standardise inputs, nor should you one-hot encode categoricals.

Instead, you assign the relevant stypes as shown below.

import feyn
import numpy as np
from pandas import DataFrame


data = DataFrame(
    {
    'numerical_input': np.random.rand(4),
    'categorical_input': np.array(['apple','pear','banana','orange']),
    'boolean_input': np.array([True, False, False, True]),
    'output_name': np.random.rand(4)
    }
)

stypes = {
    'numerical_input': 'f',
    'categorical_input': 'c',
    'boolean_input': 'f'
    }

ql = feyn.QLattice()
models = ql.auto_run(
    data=data,
    output_name='output_name',
    stypes=stypes,
    n_epochs=1
)

If no stypes are provided for an input, it is assumed to be numerical.

Infer stypes from the dataset

A quick way to define an stypes dictionary based on your data is to use our function feyn.tools.infer_stypes on your pandas dataframe, supplying your output column as well:

from feyn.tools import infer_stypes
stypes = infer_stypes(data, 'output_name')

If a column contains non-numerical values such as apple or pear like the categorical_input above then it will be assigned categorical.

We also have additional smart detection for numericals that might actually be ordinal/nominal, as well as behaviour to skip columns that are not suitable for training. Below is a short summary of some cases we look out for:

  • binary data: stype='f'
  • continuous data: stype='f'
  • numerical data:
    • Ordinal/Nominal: stype='c' (if number of distinct values is below a number related to the dataset size)
    • others: stype='f'
  • strings/objects/category: stype='c'
  • ID: skip (if dataset is larger than 10)
    • Also produces an info message
  • ISO Date: skip
    • Also produces an info message
  • Constant values: skip (if dataset is larger than 10)
    • Also produces an info message
  • Mixed types: skip
    • Also produces an info message
  • Any category with high cardinality (more than 50 distinct values, or a distinct ratio higher than 50% of the data)
    • Produces a warning message and keeps the category type.

Data validation

Note on missing values: We ignore the missing values during most type inference and instead rely on downstream validation to detect data issues like this.

That means that just because an stype is correctly assigned, you might still have some common data issues.

The validator we use in auto_run is feyn.validate_data and you can use that separately if you want to screen your data for common data issues before training.

We recommend to study your data carefully before just trusting the types, but this should give you a good starting point.

← Validate dataConverting a model to SymPy →
  • Stypes in auto_run
  • Example
  • Infer stypes from the dataset
    • Data validation

Subscribe to get news about Feyn and the QLattice.

You can opt out at any time, and you can read our privacy policy here.

Copyright © 2024 Abzu.ai - Feyn license: CC BY-NC-ND 4.0
Feyn®, QGraph®, and the QLattice® are registered trademarks of Abzu®