Feyn Documentation

Feyn Documentation

  • Learn
  • Guides
  • Tutorials
  • API Reference
  • FAQ

›Regression

Overview

  • Tutorials

Beginner

    Classification

    • Titanic survival
    • Pulsar stars
    • Poisonous Mushrooms

    Regression

    • Airbnb prices
    • Automobile MPG
    • Concrete strength

Advanced

    Regression

    • Wine Quality

Use cases

  • Rewriting models with correlated inputs
  • Complexity-Loss Trade-Off
  • Plotting the loss graph
  • Simple linear and logistic regression
  • Deploy a model for inference

Life Sciences

    Classification

    • Detecting Liver Cancer (HCC) in Plasma
    • Classifying toxicity of antisense oligonucleotides

    Regression

    • Covid-19 RNA vaccine degradation data set
    • Preventing the Honeybee Apocalypse (QSAR)

Interfacing with R

  • Classifying toxicity of antisense oligonucleotides

Archive

  • Covid-19 vaccination RNA dataset.

Airbnb prices

by: Chris Cave

Feyn version: 2.1+

Last updated: 23/09/2021

Here we use the QLattice to predict the rental price for Airbnb apartments in New York. The purpose of this tutorial is to show how to use auto_run in a regression problem.

import numpy as np
import pandas as pd

import feyn

Connect to QLattice

ql = feyn.connect_qlattice()
ql.reset(random_seed=42)

Read in a data set

The Airbnb dataset is known from Kaggle. It describes the rental price applied to actual rentals over several years in New York City

data = pd.read_csv('../data/airbnb.csv')
data=data.drop(["name","host_name", "last_review","id","host_id"], axis=1)
data=data[data["price"]<400].dropna()
data.head()
neighbourhood_group neighbourhood latitude longitude room_type price minimum_nights number_of_reviews reviews_per_month calculated_host_listings_count availability_365
0 Brooklyn Kensington 40.64749 -73.97237 Private room 149 1 9 0.21 6 365
1 Manhattan Midtown 40.75362 -73.98377 Entire home/apt 225 1 45 0.38 2 355
3 Brooklyn Clinton Hill 40.68514 -73.95976 Entire home/apt 89 1 270 4.64 1 194
4 Manhattan East Harlem 40.79851 -73.94399 Entire home/apt 80 10 9 0.10 1 0
5 Manhattan Murray Hill 40.74767 -73.97500 Entire home/apt 200 3 74 0.59 1 129

Split the data

train, test = feyn.tools.split(data, ratio=(3,1), random_state=42)

Declare semantic types

The following columns are categorical:

  • neighbourhood_group
  • neighbourhood
  • room_type

We need to declared this in a dictionary that will then pass to the auto_run function

stypes = {
    'neighbourhood_group': 'c', 
    'neighbourhood': 'c', 
    'room_type': 'c'
}

Use auto_run to obtain fitted models

This function we use to run a simulation on a QLattice and return the 10 best and most diverse models to the data

models = ql.auto_run(
    train,
    output_name='price',
    stypes=stypes
    )
Loss: 2.70E+03Epoch no. 10/10 - Tried 17034 models - Completed in 1m 33s.price linear: scale=199.500000 w=1.616570 bias=0.1566price0outgaussian2gaussian1addadd2addadd3multiplymultiply4neighbourhood categorical with 218 values bias=0.7318neighbou..5catavailability_365 linear: scale=0.005479 w=0.106377 bias=-1.8324availabi..6numroom_type categorical with 3 values bias=0.4932room_typ..7catminimum_nights linear: scale=0.001601 w=-1.961770 bias=-0.1128minimum_..8numexpexp9minimum_nights linear: scale=0.001601 w=1.168250 bias=-0.9067minimum_..10num

Summary plot

This return the metrics of the model on the train and test set.

best = models[0]
best.plot(train, test)
price linear: scale=199.500000 w=1.616570 bias=0.1566price0outgaussian2gaussian1addadd2addadd3multiplymultiply4neighbourhood categorical with 218 values bias=0.7318neighbou..5catavailability_365 linear: scale=0.005479 w=0.106377 bias=-1.8324availabi..6numroom_type categorical with 3 values bias=0.4932room_typ..7catminimum_nights linear: scale=0.001601 w=-1.961770 bias=-0.1128minimum_..8numexpexp9minimum_nights linear: scale=0.001601 w=1.168250 bias=-0.9067minimum_..10numTraining MetricsR20.505RMSE51.9MAE35.5Test0.50750.834.7Inputsneighbourhoodavailability_365room_typeminimum_nights

Training Metrics

Test

Trend

The model captures the trend of prices as we can see from this map of New York City.

pred_test = best.predict(test)
f, ax = plt.subplots(nrows=1, ncols=2, sharey=True, sharex=True, figsize=(14, 6))

# Default colors
ax[0].scatter(test["longitude"],test["latitude"], cmap="viridis", c=test['price'], vmax=200, s=8)
ax[0].set_title('Actual price')
ax[1].scatter(test["longitude"],test["latitude"], cmap="viridis", c=pred_test, vmax=200, s=8)
ax[1].set_title('Predicted price')

plt.show()

png

← Poisonous MushroomsAutomobile MPG →
  • Connect to QLattice
  • Read in a data set
  • Split the data
  • Declare semantic types
  • Use auto_run to obtain fitted models
  • Summary plot
  • Trend

Subscribe to get news about Feyn and the QLattice.

You can opt out at any time, and you can read our privacy policy here.

Copyright © 2024 Abzu.ai - Feyn license: CC BY-NC-ND 4.0
Feyn®, QGraph®, and the QLattice® are registered trademarks of Abzu®