Feyn

Feyn

  • Tutorials
  • Guides
  • API Reference
  • FAQ

›Essentials

Getting Started

  • Quick Start

Using Feyn

  • Introduction to the basic workflow
  • Asking the right questions
  • Formulate hypotheses
  • Analysing and selecting hypotheses
  • What comes next?

Essentials

  • Defining input features
  • Classifiers and Regressors
  • Filtering a QGraph
  • Predicting with a graph
  • Saving and loading graphs
  • Updating your QLattice

Plotting

  • Graph summary
  • Partial plots
  • Segmented loss
  • Goodness of fit
  • Residuals plot

Setting up the QLattice

  • Installation
  • Accessing your QLattice
  • Firewalls and proxies
  • QLattice dashboard

Advanced

  • Causal estimation
  • Converting a graph to SymPy
  • Feature importance estimation
  • Setting themes
  • Saving a graph as an image
  • Tuning the fitting process

Future

  • Future package
  • Diagnostics
  • Inspection
  • Reference
  • Stats
  • Plots

Defining input features

by: Kevin Broløs
(Feyn version 1.4 or newer)

In Feyn, we treat input features a little differently. The first thing you'll notice, is that not all inputs necessarily go into a graph. This is by design - the QLattice explores potential relationships between features and tries to come up with reduced graphs with a high degree of signal towards your question, rather than try to squeeze every last drop of signal out of your features and overfit.

This means that some things work differently than what you're used to, and you should familiarize yourself with our workflow of asking questions and interpreting the graphs using the scientific method.

Semantic types

There are two semantic types (or s-types for short) of inputs: numerical and categorical. We distinguish these two so the model knows how to understand the inputs.

  • Numerical variables are continuous (height, weight, age etc.). Inputs are automatically scaled using a linear transformation.
  • Categorical variables are discrete (nationality, hair colour etc.). Inputs are automatically encoded with weights.

When working with your QLattice you need to assign your variables to either semantic type. The default s-type for all inputs is numerical.

Assume you have the following dataset (here we generate one):

from sklearn.datasets import make_classification
import pandas as pd

from feyn.tools import split
# Generate a dataset and put it into a dataframe
X, y = make_classification()
data = pd.DataFrame(X, columns=[str(i) for i in range(X.shape[1])])
data['target'] = y

A cool way to assign types to inputs, is to rely on the types in your pandas DataFrame or numpy array (ensure that your DataFrame types match first by calling data.dtypes).

semantic_types = {}

for col in data.columns:
    if data[col].dtype == 'object':
        semantic_types[col] = 'c'

We do this all the time and we love it.

The numerical type

Normalization or standardization is typically a required step for many machine learning algorithms. In Feyn, the numerical input type automatically takes care of normalization. It does this by setting a scale based on your minimum and maximum values, and by learning a weight and bias to your input values that will transform it into a usable range.

The categorical type

One-hot encoding is similarly handled by the inputs, through the use of the categorical semantic type. By setting an input feature to be categorical, you tell it to automatically treat the categories as distinct inputs that will learn individual encodings and linear transformations.

This is a game changer for easily fitting in and learning from datasets that have categorical features with high cardinality. Normally, you'd either translate these to a sparse (label) encoding format or one-hot encode the values as separate features.

When extracting your QGraph you point to your categorical features using the stypes parameter. If you don't state the stypes parameter your features will be assumed numerical.

We assume you already know how to access your QLattice, and this code assumes you use a configuration file.

from feyn import QLattice
qlattice = QLattice()

qgraph = qlattice.get_regressor(data.columns, output='target', stypes = semantic_types)

Feyn will automatically identify and label distinct values in sequence and assign them each individual weights in the resulting graphs. Combining this with other cool features like our gaussian cells, means that we can handle large amounts of unique values, weight and sort them according to relevance and even single out specific values or clusters of values and give that information to you.

This also saves you the time of doing data preprocessing, as you can just fit in the data as-is. Not only do you not need to do any one-hot encoding or other transformation of your categorical feature, we'd also recommend very much against one-hot encoding - it will work and treat each as a binary input, but will reduce performance.

Cool, huh?

← What comes next?Classifiers and Regressors →
  • Semantic types
  • The numerical type
  • The categorical type
Copyright © 2021 Abzu.ai
Feyn®, QGraph®, and the QLattice® are registered trademarks of Abzu®