Categorical features · Feyn Documentation

by: Kevin Broløs, Chris Cave and Emil Larsen
(Feyn version 3.0 or newer)

A feature is categorical if there is no clear ordering in the values the feature can take. The values a categorical feature can take are called categories. Below is an example dataset containing categorical features and their categories:

Country	Favourite colour	Gender	Smoker/non-smoker
Denmark	Red	Male	0
Spain	Yellow	Female	1
UK	Blue	Male	1
Brazil	Green	Male	0
USA	Yellow	Female	1
Italy	Red	Female	0

In the example above each feature does not have an obvious ordering.

How the `QLattice` treats categorical features

When the QLattice samples models, it uses something called semantic types to decide which inputs are numerical and which are categorical. These types are inferred automatically when you use auto_run, but you may also choose to specify them on your own.

Here's an example where we explicitly specify them.

import feyn
import numpy as np
import pandas as pd

data = pd.DataFrame({
    "Country": ["Denmark", "Spain", "UK", "Brazil", "USA", "Italy"],
    "Favourite colour": ["Red", "Yellow", "Blue", "Green", "Yellow", "Red"],
    "Gender": ["Male", "Female", "Male", "Male", "Female", "Female"],
    "Smoker": [0,1,1,0,1,0]
})

stypes = {
    'Country': 'c',
    'Favourite colour': 'c',
    'Gender': 'c',
    }

After which we can pass them into the QLattice.

ql = feyn.QLattice(random_seed=42)
models = ql.auto_run(
    data=data,
    output_name="Smoker",
    stypes=stypes,
    n_epochs=1,
    max_complexity=2
)
model = models[0]

Here is one graph from the output of auto run.

Categorical weights

When the QLattice fits a model to the data, any categories will be assigned a learned numerical value - a weight. This happens in order to convert the categories into a number so it can be used as an input to a mathematical function.

We can see the weights associated to each category by picking out the input variable in the model, and calling get_parameters:

params = model.get_parameters("Favourite colour")
print(params)

          Favourite colour
category                  
Green             0.242616
Red               0.242616
Yellow           -0.299821
Blue             -0.299821

When we predict on a sample that contains a category in the list above then we first convert that category to category_weight + bias.

This value gets passed into the functions inside the model and behaves like a normal numerical value.

If there are any NaN values present for some observations in a categorical feature, then the NaN values will be interpretated as a category.

How the QLattice treats categorical features

Categorical weights

Subscribe to get news about Feyn and the QLattice.

How the `QLattice` treats categorical features