by: Kevin Broløs
(Feyn version 1.4 or newer)
Here we'll go over how you can improve the fitting process of feyn to better suit your needs, for instance if you want to:
- train on multiple cores/threads
- use a different loss function
- tune the amount of samples used during training each graph
- sort by a different loss function for updating the
Add some thermal paste, it's about to get hot
Let's get going and generate a dataset:
from sklearn.datasets import make_classification import pandas as pd from feyn.tools import split # Generate a dataset and put it into a dataframe X, y = make_classification() data = pd.DataFrame(X, columns=[str(i) for i in range(X.shape)]) data['target'] = y # Split into a train, validation and test set train, test = split(data, ratio=(0.75, 0.25)) train, validation = split(train, ratio=(0.75, 0.25))
from feyn import QLattice qlattice = QLattice() qgraph = qlattice.get_regressor(data, 'target') qgraph.fit(train)
When you call the fit function, it trains each current graph inside the
QGraph once for each sample in the dataset, unless the latter is smaller than 10000. In this case the dataset will be upsampled to 10000 samples before fitting. Such behavior is controlled by the
n_samples parameter, whose default value is 10000. Concurrently with training, the
fit function generetes a completely new set of graphs, that are also trained on the data provided.
After the fitting, the graphs are sorted by their performance of your favourite metric, so you will be able to access the best graph found by indexing the
best = qgraph
The average size of the
QGraph is currently in the thousands, so you can consider this fitting a thousand different models, verifying it against your result and taking the best learnings with you for the next iteration where you add thousands of brand new ones, comparing it to your current best contenders.
Alternate loss functions
You can decide the loss function you use among the ones in feyn.losses, using the
from feyn import losses qgraph.fit(train, loss_function=losses.absolute_error)
Having many graphs, means a lot of work to do. So if you have multiple cores in your CPU, you can take advantage of each of them, by declaring the amount of threads you want to make available in the
qgraph.fit(train, loss_function=losses.absolute_error, threads=4)
You're still going to have to end up with a final best (or multiple) graph(s) to use as your model, so let's talk about the selection process.
Sorting the QGraph according to a different metric
You can sort the graphs in the
QGraph by something other that the data set you fitted on, such as your validation set.
qgraph.sort(validation) best_graph = qgraph
Notice that while
fit sorts the head of your
QGraph by their fitness to the training set, the
sort method sorts it by its fitness to any dataset you provide.
You can use any kind of metric you wish for sorting your graphs. The default is the squared error calculated on the dataset you feed into the sort function.
Here are a few examples:
best = qgraph.sort(train, loss_function=losses.absolute_error) best_0_10 = qgraph.sort(train[0:10]) # The best graph on the first ten samples
This is useful for validation sets, for cross validation, or for exploring the preformance of the graphs on some subset of your data.