Complexity-Loss Trade-Off

Feyn version: 2.1.0+

In this tutorial we will use the QLattice to compute an approximation of the Pareto front of solutions with respect to model complexity and loss. The Pareto front will give us an impression of the trade-off between model complexity and loss for models generated by the QLattice for a given data set.

Pareto Front Definition

Initially we recall the formal definition of a Pareto front:

Let $X$ , $Y$ be solutions, $F$ the fitness function (higher is better) and $C$ the model complexity (lower is better). Then (Neumann, 2012):

$Y$ weakly dominates $X$ (denoted by $Y \succeq X$ ) if and only if $F(Y) \geq F(X) \land C(Y) \leq C(X)$
$Y$ dominates $X$ (denoted by $Y \succ X$ ) if and only if $(Y \succeq X) \land (F(Y) > F(X) \lor C(Y) < C(X))$
$X$ and $Y$ are incomparable if neither $X \succeq Y$ nor $Y \succeq X$

A solution is called Pareto optimal if it is not dominated by any other solution in the search space. The set of Pareto optimal solutions is called the Pareto optimal set and the set of corresponding objective vectors is called the Pareto front.

Imports

import feyn
import pmlb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from collections import namedtuple

Load Data Set

We will use the data set 207_autoPrise from the Penn Machine Learning Benchmarks (Romano et al. 2012) data set suite (link to Pandas profile report: https://epistasislab.github.io/pmlb/profile/207_autoPrice.html). The data set has 159 observations and 15 features; one of which is categorical. There are no missing values, so we can readily use the data set. We first split the data set in a train-test split and define the stypes:

data = pmlb.fetch_data(dataset_name="207_autoPrice")
train, test = train_test_split(data, test_size=0.33, random_state=42)
stypes = {'symboling': 'c'}

Run the `QLattice`

We start by running the QLattice 20 epochs for complexity levels ranging from 1 through 15. For each complexity level we filter out models with a different complexity level before making updates to the QLattice. After each run we collect the best performing models in a list, which we will compute the Pareto front based on. We use loss as criterion in every run.

By setting up the training loop this way with restrictions on the model complexity the aim is to get accurate models for every complexity level. Had we run the QLattice once with no explicit restriction on the model complexity and used BIC as criterion the search would yield the model that should generalise the best to out-of-sample data, which would be sufficient in a normal use case.

The training loop is given below. We use the feyn primitives and a custom filter function set up in a two layer loop as described above.

ql = feyn.connect_qlattice()

models_all_complexity_levels = []
for complexity_level in range(1, 16):
    models = []
    ql.reset(random_seed=complexity_level)
    for epoch in range(20):
        models += ql.sample_models(train, output_name='target', kind="regression", stypes=stypes)
        models = list(filter(lambda m: m.edge_count == complexity_level, models))
        models = feyn.fit_models(models, train, criterion=None)
        models = feyn.prune_models(models)
        ql.update(models)
    models_all_complexity_levels.extend(models)

Compute Pareto Front

We can now compute the Pareto front based on the definition stated earlier. For simplicity and readability we implement this naively here directly based on the definition. You can definitely optimise this, but will not be the focus of this tutorial.

ParetoPoint = namedtuple('ParetoPoint', ['loss', 'complexity'])

def weakly_dominates(p1, p2):
        return p1.loss <= p2.loss and p1.complexity <= p2.complexity

def strictly_dominates(p1, p2):
    return weakly_dominates(p1, p2) and (p1.loss < p2.loss or p1.complexity < p2.complexity)

def compute_pareto_front(models):
    frontier = set()
    model_losses = [feyn.metrics.rmse(train['target'], m.predict(train)) for m in models]
    model_edge_counts = [m.edge_count for m in models]
    frontier = [ParetoPoint(loss, edges) for loss, edges in zip(model_losses, model_edge_counts)]
    frontier = {p for p in frontier if not any([strictly_dominates(p_prime, p) for p_prime in frontier])}

    return sorted(list(frontier), key=lambda p: p.loss)

pareto_front = compute_pareto_front(models_all_complexity_levels)

Pareto Front Plot

Finally, we plot the approximation of the Pareto front approximation as a scatter plot with steps between points.

x, y = list(zip(*pareto_front))

plt.figure()
plt.step(x, y)
plt.scatter(x, y, marker='D')
plt.yticks(ticks=range(min(y), max(y)+1), labels=range(min(y), max(y)+1))
plt.xlabel("RMSE")
plt.ylabel("Edges")
plt.show()

png

Looking at the plot we see among other things that we would have to accept a considerable increase in loss if we want a model with only one edge instead of three edges compared to the loss increase from choosing a model with five edges instead of nine edges.

Next steps from here could be to select a trade-off between model loss and complexity that we like, retrain a model with that complexity level on the whole training data and evaluate the performance on the held-out test set.

References

Frank Neumann. Computational complexity analysis of multi-objective genetic programming. In: Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation. 2012, pp. 799-806.
Joseph D. Romano, Trang T. Le, William La Cava, John T. Gregg, Daniel J. Goldberg, Natasha L. Ray, Praneel Chakraborty, Daniel Himmelstein, Weixuan Fu, and Jason H. Moore. PMLB v1.0: an open source dataset collection for benchmarking machine learning methods. 2021. arXiv: 2012.00058.

Feyn Documentation

Classification

Regression

Regression

Classification

Regression

Pareto Front Definition

Imports

Load Data Set

Run the `QLattice`

Compute Pareto Front

Pareto Front Plot

References

Subscribe to get news about Feyn and the QLattice.

Pareto Front Definition

Imports

Load Data Set

Run the QLattice

Compute Pareto Front

Pareto Front Plot

References

Subscribe to get news about Feyn and the QLattice.

Run the `QLattice`