Feyn Documentation

Feyn Documentation

  • Learn
  • Guides
  • Tutorials
  • API Reference
  • FAQ

›Regression

Overview

  • Tutorials

Beginner

    Classification

    • Titanic survival
    • Pulsar stars
    • Poisonous Mushrooms

    Regression

    • Airbnb prices
    • Automobile MPG
    • Concrete strength

Advanced

    Regression

    • Wine Quality

Use cases

  • Rewriting models with correlated inputs
  • Complexity-Loss Trade-Off
  • Plotting the loss graph
  • Simple linear and logistic regression
  • Deploy a model for inference

Life Sciences

    Classification

    • Detecting Liver Cancer (HCC) in Plasma
    • Classifying toxicity of antisense oligonucleotides

    Regression

    • Covid-19 RNA vaccine degradation data set
    • Preventing the Honeybee Apocalypse (QSAR)

Interfacing with R

  • Classifying toxicity of antisense oligonucleotides

Archive

  • Covid-19 vaccination RNA dataset.

Automobile MPG

by: Chris Cave

Feyn version: 2.1+

Last updated: 24/09/2021

Here we use the QLattice to predict the miles per gallon (MPG) of cars based on other attributes such as:

  • The number of cylinders,
  • The horsepower,
  • Its weight,
  • The year it was made.

You can find this dataset and further descriptions of the features on UCI Machine Learning Repository.

import pandas as pd
import feyn
import numpy as np

from sklearn.model_selection import train_test_split

Data clean up

There are some missing values for horsepower. What we will do is just replace them with the mean value of that variable.

data = pd.read_csv("../data/auto_mpg.csv")

# Car name is a unique identifier so we remove it from training
data = data.drop("car name", axis=1)

# Obtain the mean of the horsepower
data_wo_na = data.query("horsepower != '?'").astype({"horsepower": int})
horsepower_mean = data_wo_na["horsepower"].mean()

# Replace the missing values '?' with the mean
data = data.replace(to_replace="?", value=horsepower_mean)
data = data.astype({"horsepower": int})

data.head()
mpg cylinders displacement horsepower weight acceleration model year origin
0 18.0 8 307.0 130 3504 12.0 70 1
1 15.0 8 350.0 165 3693 11.5 70 1
2 18.0 8 318.0 150 3436 11.0 70 1
3 16.0 8 304.0 150 3433 12.0 70 1
4 17.0 8 302.0 140 3449 10.5 70 1

Connect to the QLattice

Here we split the data into train and test. There are 398 samples, which is quite small. We will make the split into equal parts so we have bigger protection against overfitting.

random_state=42

train, test = train_test_split(data, test_size=0.5, random_state=random_state)

ql = feyn.connect_qlattice()
ql.reset(random_state)

Use auto_run to obtain fitted models

models = ql.auto_run(train, output_name="mpg", n_epochs=20)
Loss: 1.02E+01Epoch no. 20/20 - Tried 36053 models - Completed in 54s.mpg linear: scale=18.800000 w=0.919101 bias=1.6110mpg0outtanhtanh1addadd2addadd3model year linear: scale=0.166667 w=0.398328 bias=-2.0050model ye..4numweight linear: scale=0.000567 w=-0.960218 bias=-0.1468weight5numhorsepower linear: scale=0.011173 w=-0.527716 bias=-1.2903horsepow..6num

Summary plot to evaluate performance

Here we evaluate the final performance of the model with useful metrics and plots.

best = models[0]
best.plot(train, test)
mpg linear: scale=18.800000 w=0.919101 bias=1.6110mpg0outtanhtanh1addadd2addadd3model year linear: scale=0.166667 w=0.398328 bias=-2.0050model ye..4numweight linear: scale=0.000567 w=-0.960218 bias=-0.1468weight5numhorsepower linear: scale=0.011173 w=-0.527716 bias=-1.2903horsepow..6numTraining MetricsR20.848RMSE3.17MAE2.33Test0.8772.612.02Inputsmodel yearweighthorsepower

Training Metrics

Test

Pretty great performance with few features! We can see how simple the model is and yet still captures a lot of the signal in the data!

Plot model response

Here we will use plot_response_1d to see how the model behaves under different contraints.

best.inputs
['model year', 'weight', 'horsepower']
# This will be the input along the x-axes
x_axes = best.inputs[0]

# This will be the different colours where each colour represents the input at quartiles 25%, 50%, 75%
colours = {best.inputs[1]: train[best.inputs[1]].quantile(q=[0.25,0.5, 0.75]).values}

quantiles = train["model year"].quantile(q=[0.25,0.5, 0.75]).values

best.plot_response_1d(train, by=x_axes, input_constraints=colours)

png

← Airbnb pricesConcrete strength →
  • Data clean up
  • Connect to the QLattice
  • Use auto_run to obtain fitted models
  • Summary plot to evaluate performance
  • Plot model response

Subscribe to get news about Feyn and the QLattice.

You can opt out at any time, and you can read our privacy policy here.

Copyright © 2024 Abzu.ai - Feyn license: CC BY-NC-ND 4.0
Feyn®, QGraph®, and the QLattice® are registered trademarks of Abzu®