# Fitting the QGraph

by: Kevin Broløs

(Feyn version 1.4 or newer)

## O' graph of graphs

Alright, you've got your data prepared and your `QLattice`

set up. What next?

We briefly touched upon the `QGraph`

as a concept. It's time to go a little deeper. Essentially, the `QGraph`

is the representation of the infinite ordered list of `graphs`

(or paths) conceivable through the `QLattice`

from your `input registers`

to your `output register`

, considering all possible combinations of interactions.

It's easy to imagine how this combinatorically explodes as `interactions`

happen between both the features and the transformations of features. If you think about if for a bit, you realize that there are really infinitely many such graphs. The `QGraph`

is there to help you search through this infinite list and find the best `graph`

it possibly can in that list.

What you need to know though, is what it represents, and how you interact with it. So let's back up a bit.

## It's just graphs. Really.

Have you ever looked at a neural network and thought, "Huh, I wish this would just build itself. Oh, and also have all kinds of cool functions to choose from"?

Welcome to the `QGraph`

. This is essentially a testbed for all those weird ideas you had when you did your data analysis and considered what to use and not to use, except it does all that for you, and also comes with alternate suggestions. Initially, these suggestions will be pretty weird, but a few gems will emerge. Through updating the `QLattice`

as you evaluate all these suggestions, the search space narrows and more of these suggestions become relevant.

This is done through what we refer to as the `update loop`

.

## The update loop

While you can just extract a `QGraph`

, keep fitting it and never call the `QLattice`

again, that's not really taking advantage of the diversity of solutions available at your fingertips, and might cause you to converge to some pretty bad decisions made early on before the problem space was really well understood.

That's why we update the QLattice. Doing so, allows the `QLattice`

to hone in on your problem space and keep suggesting things that are useful, rather than just random stuff.

An example update loop could look like this, but can be as sophisticated as you want it to (think cross validation, ensemble solutions, only your imagination is the limit...)

```
# Get a QGraph to fit.`.
qgraph = qlattice.get_regressor(data.columns, 'target')
no_loops = 10
for _ in range(no_loops):
# Fit the QGraph with your local data
qgraph.fit(train)
# The top graphs that are enough far away in the QLattice
best_graphs = qgraph.best()
# Feed these graphs back to the QLattice. The next fit will explore graphs that are more similar to it
qlattice.update(best_graphs)
```

## Add some thermal paste, it's about to get hot

I want to show you some of these `graphs`

, but let's first get into the fit function and how it works.

```
qgraph.fit(train)
```

When you call the fit function, it trains each current graph inside the `QGraph`

once for each sample in the dataset, unless the latter is smaller than 10000. In this case the dataset will be upsampled to 10000 samples before fitting. Such behavior is controlled by the `n_samples`

parameter, whose default value is 10000. Concurrently with training, the `fit`

function generetes a completely new set of graphs, that are also trained on the data provided.

After the fitting, the graphs are sorted by their performance of your favourite metric, so you will be able to access the best graph found by indexing the QGraph directly

```
best = qgraph[0]
```

The average size of the `QGraph`

is currently in the `thousands`

, so you can consider this fitting a thousand different models, verifying it against your result and taking the best learnings with you for the next iteration where you add thousands of brand new ones, comparing it to your current best contenders.

You can decide the loss function you use among the ones in feyn.losses, using the `loss_function`

parameter.

```
from feyn import losses
qgraph.fit(train, loss_function=losses.absolute_error)
```

### Threading the needle

Having many `graphs`

, means a lot of work to do. So if you have multiple cores in your CPU, you can take advantage of each of them, by declaring the amount of threads you want to make available in the `fit`

function.

```
qgraph.fit(train, loss_function=losses.absolute_error, threads=4)
```

You're still going to have to end up with a final best (or multiple) `graph(s)`

to use as your model, so let's talk about the selection process.

### Survival of the fittest

You can sort the `graphs`

in the `QGraph`

by something other that the data set you fitted on, such as your validation set.

```
qgraph.sort(validation)
best_graph = qgraph[0]
```

Notice that while `fit`

sorts the head of your `QGraph`

by their fitness to the training set, the `sort`

method sorts it by its fitness to any dataset you provide.

You can use any kind of metric you wish for sorting your `graphs`

. The default is the `squared error`

calculated on the dat aset you feed into the sort function.

Here are a few examples:

```
best = qgraph.sort(train, loss_function=losses.absolute_error)[0]
best_0_10 = qgraph.sort(train[0:10])[0] # The best graph on the first ten samples
```

This is useful for validation sets, for cross validation, or for exploring the preformance of the graphs on some subset of your data.

## You promised me graphs

To get and render a selection of `graphs`

from a `QGraph`

, call the head function.

In a Jupyter or IPython environment, this will render the `graphs`

.

```
qgraph.head(n=3)
```

You can also render each individual graph by indexing the `QGraph`

.

```
qgraph[3]
```

## Plotting during fitting

By now you've probably also noticed that if you run in an IPython environment, the `QGraph.fit()`

will display the current best (lowest loss) graph while it's training.

You can change that behaviour using the show parameter, disabling it entirely with `None`

, displaying text printouts with `text`

or adding in your own callback function to plot the metrics that matter the most to you.

```
# Examples
qgraph.fit(train, show=None)
qgraph.fit(train, show="text")
def plot_callback(graph, loss):
print(f"Loss {loss}")
qgraph.fit(train, show=plot_callback)
```

```
Loss 0.016112560918642264
```

Next, let's move onto the `graphs`

itself, and how you save them for later, get predictions and evaluate them!