Classifiers and Regressors
by: Kevin Broløs
(Feyn version 1.4 or newer)
The QGraph
Before getting a classifier or regressor, it's important to understand how these are defined in the context of the QLattice
.
Essentially, the QGraph
is the representation of the infinite ordered list of graphs (or paths) conceivable through the QLattice
from your input features to your output feature, considering all possible combinations of interactions.
It's easy to imagine how this combinatorically explodes as interactions
happen between both the features and the transformations of features. If you think about if for a bit, you realize that there are really infinitely many such graphs. The QGraph
is there to help you search through this infinite list and find the best graph it possibly can in that list.
What you need to know though, is what it represents, and how you interact with it.
The QGraph
is essentially a testbed for all those weird ideas you had when you did your data analysis and considered what to use and not to use, except it does all that for you, and also comes with alternate suggestions. Initially, these suggestions will be pretty weird, but a few gems will emerge. Through updating the QLattice
as you evaluate all these suggestions, the search space narrows and more of these suggestions become relevant.
We showcase how to use this idea at length in our section on formulating hypotheses
First things first: data
First we'll get a dataset. Let's just generate one using sklearn
.
from sklearn.datasets import make_classification
import pandas as pd
from feyn.tools import split
# Generate a dataset and put it into a dataframe
X, y = make_classification()
data = pd.DataFrame(X, columns=[str(i) for i in range(X.shape[1])])
data['target'] = y
# Split into a train and test set
train, test = split(data, ratio=(0.75, 0.25))
Get a QGraph instance
Now that we have this dataset, we can let the QLattice
know that we'd like a classifier using those inputs, and a given output. The exact same method applies for getting a regressor - only the target function changes.
The QLattice
is a generator of graphs from input to output. A QGraph
is an unbounded list of graphs that have been generated from the QLattice
. When we extract a QGraph
, we only need to do it once, and declare what features we want to use and what should be the output variable.
The following example shows that we use all the columns as input and the target
column as the output.
We're assuming that you've gone through and set up your QLattice with an configuration file.
from feyn import QLattice
qlattice = QLattice()
# This will extract a QGraph containing classifiers
qgraph = qlattice.get_classifier(data.columns, 'target')
Or for regression:
# This will also work, but will treat it as a regression problem.
qgraph = qlattice.get_regressor(data.columns, 'target')
You still haven't input any of the data yet, and that's because your QLattice
doesn't work with data, it works with concepts.
Fitting the qgraph
Fitting a QGraph
is as simple as calling qgraph.fit(data)
, but we often want to do this multiple times. The reason for that, is that every time we call .fit
, the QGraph
discards the worst graphs and gives you a new evolution of graphs based on what the QLattice
has learnt from you. This means that you should also update
the QLattice
as you go along with your best graphs. Doing so allows the QLattice
to hone in on your problem space and keep suggesting things that are useful, rather than just random stuff.
We refer to this process as the update loop
.
An example update loop could look like this, but can be as sophisticated as you want it to (think cross validation, ensemble solutions, only your imagination is the limit...)
# Let's go back to our classifier for this example
qgraph = qlattice.get_classifier(data.columns, 'target')
n_loops = 10
for _ in range(n_loops):
# Fit the QGraph with your local data
# Note: This automatically fetches a new evolution of graphs, while keeping the best ones you already have!
qgraph.fit(train)
# The top graphs that have evolved independantly from each other in the QLattice
best_graphs = qgraph.best()
# Feed these graphs back to the QLattice. The next fit will explore graphs that are more similar to it
qlattice.update(best_graphs)
Inspecting the QGraph
To get and render a selection of graphs from a QGraph
, call the head function.
In a Jupyter
or IPython
environment, this will render the graphs.
qgraph.head(n=3)
You can also render each individual graph by indexing the QGraph
.
qgraph[3]
Plotting during fitting
By now you've probably also noticed that if you run in an IPython
environment, the QGraph.fit()
will display the current best (lowest loss) graph while it's training.
You can change that behaviour using the show parameter, disabling it entirely with None
, displaying text printouts with text
or adding in your own callback function to plot the metrics that matter the most to you.
# Examples
qgraph.fit(train, show=None)
qgraph.fit(train, show="text")
def plot_callback(graph, loss):
print(f"Loss {loss}")
qgraph.fit(train, show=plot_callback)
Loss 0.016112560918642264