# Classifiers and Regressors

by: Kevin Broløs

(Feyn version 1.4 or newer)

## The QGraph

Before getting a **classifier** or **regressor**, it's important to understand how these are defined in the context of the `QLattice`

.

Essentially, the `QGraph`

is the representation of the infinite ordered list of **graphs** (or paths) conceivable through the `QLattice`

from your **input features** to your **output feature**, considering all possible combinations of interactions.

It's easy to imagine how this combinatorically explodes as `interactions`

happen between both the features and the transformations of features. If you think about if for a bit, you realize that there are really infinitely many such graphs. The `QGraph`

is there to help you search through this infinite list and find the best graph it possibly can in that list.

What you need to know though, is what it represents, and how you interact with it.

The `QGraph`

is essentially a testbed for all those weird ideas you had when you did your data analysis and considered what to use and not to use, except it does all that for you, and also comes with alternate suggestions. Initially, these suggestions will be pretty weird, but a few gems will emerge. Through updating the `QLattice`

as you evaluate all these suggestions, the search space narrows and more of these suggestions become relevant.

We showcase how to use this idea at length in our section on formulating hypotheses

## First things first: data

First we'll get a dataset. Let's just generate one using `sklearn`

.

```
from sklearn.datasets import make_classification
import pandas as pd
from feyn.tools import split
# Generate a dataset and put it into a dataframe
X, y = make_classification()
data = pd.DataFrame(X, columns=[str(i) for i in range(X.shape[1])])
data['target'] = y
# Split into a train and test set
train, test = split(data, ratio=(0.75, 0.25))
```

## Get a QGraph instance

Now that we have this dataset, we can let the `QLattice`

know that we'd like a **classifier** using those inputs, and a given output. The exact same method applies for getting a **regressor** - only the target function changes.

The `QLattice`

is a generator of graphs from input to output. A `QGraph`

is an unbounded list of graphs that have been generated from the `QLattice`

. When we extract a `QGraph`

, we only need to do it once, and declare what features we want to use and what should be the output variable.

The following example shows that we use all the columns as input and the `target`

column as the output.

We're assuming that you've gone through and set up your QLattice with an configuration file.

```
from feyn import QLattice
qlattice = QLattice()
# This will extract a QGraph containing classifiers
qgraph = qlattice.get_classifier(data.columns, 'target')
```

Or for regression:

```
# This will also work, but will treat it as a regression problem.
qgraph = qlattice.get_regressor(data.columns, 'target')
```

You still haven't input any of the data yet, and that's because your `QLattice`

doesn't work with data, it works with concepts.

## Fitting the qgraph

Fitting a `QGraph`

is as simple as calling `qgraph.fit(data)`

, but we often want to do this multiple times. The reason for that, is that every time we call `.fit`

, the `QGraph`

discards the worst graphs and gives you a new evolution of graphs based on what the `QLattice`

has learnt from you. This means that you should also `update`

the `QLattice`

as you go along with your best graphs. Doing so allows the `QLattice`

to hone in on your problem space and keep suggesting things that are useful, rather than just random stuff.

We refer to this process as `the update loop`

.

An example update loop could look like this, but can be as sophisticated as you want it to (think cross validation, ensemble solutions, only your imagination is the limit...)

```
# Let's go back to our classifier for this example
qgraph = qlattice.get_classifier(data.columns, 'target')
n_loops = 10
for _ in range(n_loops):
# Fit the QGraph with your local data
# Note: This automatically fetches a new evolution of graphs, while keeping the best ones you already have!
qgraph.fit(train)
# The top graphs that have evolved independantly from each other in the QLattice
best_graphs = qgraph.best()
# Feed these graphs back to the QLattice. The next fit will explore graphs that are more similar to it
qlattice.update(best_graphs)
```

## Inspecting the QGraph

To get and render a selection of graphs from a `QGraph`

, call the head function.

In a `Jupyter`

or `IPython`

environment, this will render the graphs.

```
qgraph.head(n=3)
```

You can also render each individual graph by indexing the `QGraph`

.

```
qgraph[3]
```

## Plotting during fitting

By now you've probably also noticed that if you run in an `IPython`

environment, the `QGraph.fit()`

will display the current best (lowest loss) graph while it's training.

You can change that behaviour using the show parameter, disabling it entirely with `None`

, displaying text printouts with `text`

or adding in your own callback function to plot the metrics that matter the most to you.

```
# Examples
qgraph.fit(train, show=None)
qgraph.fit(train, show="text")
def plot_callback(graph, loss):
print(f"Loss {loss}")
qgraph.fit(train, show=plot_callback)
```

```
Loss 0.016112560918642264
```