Feyn Documentation

Feyn Documentation

  • Learn
  • Guides
  • Tutorials
  • API Reference
  • FAQ

›Use cases

Overview

  • Tutorials

Beginner

    Classification

    • Titanic survival
    • Pulsar stars
    • Poisonous Mushrooms

    Regression

    • Airbnb prices
    • Automobile MPG
    • Concrete strength

Advanced

    Regression

    • Wine Quality

Use cases

  • Rewriting models with correlated inputs
  • Complexity-Loss Trade-Off
  • Plotting the loss graph
  • Simple linear and logistic regression
  • Deploy a model for inference

Life Sciences

    Classification

    • Detecting Liver Cancer (HCC) in Plasma
    • Classifying toxicity of antisense oligonucleotides

    Regression

    • Covid-19 RNA vaccine degradation data set
    • Preventing the Honeybee Apocalypse (QSAR)

Interfacing with R

  • Classifying toxicity of antisense oligonucleotides

Archive

  • Covid-19 vaccination RNA dataset.

Deploy a model for inference

by: Miquel Triana

Feyn version: 2.1+

Last updated: 28/10/2021

In many ocasions, one might want to use a newly trained model to make predictions on incoming new data. This is what is called deploying a model for inference.

Deployment can be a complicated task when using black box algorithms, such as random forests or neural networks. This is not the case for feyn, as its models are simple mathematical expressions that can be exported and evaluated without the need of passing around large files, or importing libraries.

In this tutorial we will show you how to output a model so it can be used to make predictions somewhere else. We will use the models obtained in the Titanic survival tutorial.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sympy.printing.printer import Printer

import feyn

Parse data and train model

We start by loading and parsing the data. Have a look at the Titanic survival tutorial if you are interested in the details.

df = pd.read_csv('../data/deploy_inference.csv')
# Input missing data, drop non relevant columns
age_dist = df[(df.pclass == 3) & (df.embarked == 'C') & (df.sex == 'male') & 
              (df.sibsp == 0) & (df.parch == 0) & (df.survived == 0)].age.dropna()
mean_age = np.mean(age_dist)
std_age = np.std(age_dist)
np.random.seed(42)
age_guess = np.random.normal(mean_age, std_age, size=2)
df_mod = df.drop(['boat', 'body', 'home.dest', 'name', 'ticket', 'cabin'], axis=1)
df_mod.loc[df[df.age.isna()].index, 'age'] = age_guess
# Split data
output = 'survived'
train, test = train_test_split(df_mod, test_size=0.4, random_state=42, stratify=df_mod[output])
# Define categorical inputs
stypes = {}
for col in train.columns:
    if train[col].dtype == 'O':
        stypes[col] = 'c'
stypes['pclass'] = 'c' 

Once the data is parsed, we can easily train a model with the auto_run function

# Train models
ql = feyn.connect_qlattice() 
ql.reset(random_seed=42)
models = ql.auto_run(train, output, kind='classification', stypes=stypes)
Loss: 4.63E-01Epoch no. 10/10 - Tried 10947 models - Completed in 21s.survived logistic: w=1.7298 bias=0.9800survived0outaddadd1multiplymultiply2addadd3sex categorical with 2 values bias=-0.3331sex4catpclass categorical with 3 values bias=-0.3331pclass5catage linear: scale=0.025105 scale offset=29.590545 w=1.629481 bias=2.1292age6numsibsp linear: scale=0.250000 scale offset=0.481950 w=-0.955711 bias=0.6707sibsp7num
# Select the best performing model
best_model = models[0]

Extract model

The method sympify will output a sympy expression. To be able to evaluate it, we will output the categorical features in terms of their categories and weights (see the how feyn handles categories for more details). Similarly, we will output the full sygmoid function if the model is a classifier.

sympy_model = best_model.sympify(symbolic_cat=False, symbolic_lr=True)

The sympy expression can be converted into a string with the method doprint from the class Printer of sympy. We replace the expression "exp" by "np.exp", a numpy function that can be evaluated.

printer = Printer()
string_model = printer.doprint(sympy_model).replace("exp", "np.exp")
string_model
'1/(0.0963819*np.exp(0.413291*sibsp - 1.72977*(0.0409075*age + 0.918737)*(0.346173*pclass_1 + 0.00905154*pclass_2 - 0.287936*pclass_3 + 0.359734*sex_female - 0.377445*sex_male - 0.666242)) + 1)'

Inference: evaluate expression

The expression contained in string_model can be evaluated inside a function to make predictions without the need of importing feyn and loading any object. To create this function we can simply copy and paste the expression inside the definition of model_inference.

def model_inference(sibsp, age, pclass_1, pclass_2, pclass_3, sex_female, sex_male):
    return 1/(0.0963819*np.exp(0.413291*sibsp - 
                               1.72977*(0.0409075*age + 0.918737)*
                               (0.346173*pclass_1 + 
                                0.00905154*pclass_2 - 
                                0.287936*pclass_3 + 
                                0.359734*sex_female - 
                                0.377445*sex_male - 0.666242)) + 1)

The model is expressed in terms of the one-hot-encoded features, that can be obtained easily with the pandas function get_dummies

# Get numeric features as numpy arrays
sibsp = test.sibsp.values
age = test.age.values

# One-hot-encoding of categorical features
pclass_1 = pd.get_dummies(test.pclass)[1].values
pclass_2 = pd.get_dummies(test.pclass)[2].values
pclass_3 = pd.get_dummies(test.pclass)[3].values
sex_female = pd.get_dummies(test.sex)["female"].values
sex_male = pd.get_dummies(test.sex)["male"].values

We can check that indeed the model we extracted gives the same results as the predict method of the feyn model (up to the specified numeric precision of the coefficients)

best_model.predict(test)[0:20]-model_inference(sibsp, age, pclass_1, pclass_2, pclass_3, sex_female, sex_male)[0:20]
array([ 1.81708470e-07, -1.55812481e-06, -9.27053956e-07, -6.67120051e-07,
       -1.40847184e-06, -4.75640350e-07, -6.45967956e-07, -1.60310114e-06,
       -4.71003742e-08, -9.27053956e-07, -1.53719299e-06, -7.19278846e-07,
       -3.07361949e-07, -9.71066287e-07, -7.23793551e-07,  6.77314471e-08,
       -1.91267928e-07, -9.28127093e-07, -1.96795965e-07, -8.04840904e-07])
← Simple linear and logistic regressionDetecting Liver Cancer (HCC) in Plasma →
  • Parse data and train model
  • Extract model
  • Inference: evaluate expression

Subscribe to get news about Feyn and the QLattice.

You can opt out at any time, and you can read our privacy policy here.

Copyright © 2024 Abzu.ai - Feyn license: CC BY-NC-ND 4.0
Feyn®, QGraph®, and the QLattice® are registered trademarks of Abzu®