# Inspection plots

by: Valdemar Stentoft-Hansen

(Feyn version 1.4 or newer)

The graph depiction shows you your best fitted model given your critria for complexity. The graph can be directly translated to a mathematical formula and you are good to go from there. However, the middle step of dissection your graph's dynamics might be gone astray if you move forward too fast. Questions like; in what space of my data can I trust my predictions and are there areas where my data points are spread thin and the certainty of the graph should be interpreted as so? Does the dynamics of my model match my understanding of the underlying processes of my data? I expect `y`

to increase when `x`

decreases – is that also the case in my graph?

`Feyn`

offers a range of tools to help you dissect your graph and its dynamics.

As sample data we are going for the boston housing price prediction dataset from sklearn where we are predicting median house prices of different areas around Boston. Below I import data, prepare it and find my graph of choice with my QLattice:

```
from sklearn.datasets import load_boston
import pandas as pd
from feyn import QLattice
from feyn.plots import plot_graph_summary
from feyn.tools import split
#Download boston housing dataset
df_boston = pd.DataFrame(boston.data, columns=boston.feature_names)
df_boston['PRICE'] = boston.target
# Train/test split
train, test = split(df_boston)
# Connect to QLattice
ql = QLattice()
# Get a regressor
qgraph = ql.get_regressor(train, 'PRICE', max_depth = 2) #max_depth = 2, let's not overdo it
qgraph.fit(train)
# Select a graph from your fitted QGraph
best_graph = qgraph[0]
```

Of the 13 possible features I am served a graph containing 3 that in conjunction explains the price. The conjunction is made up of two gaussian interactions and a multiply – there seems to be some non-linear associations in this data. For context, the three features covers a "percentage lower status of the population" (LSTAT) – whatever that means, the per capita crime rate (CRIM), and the average number of rooms per dwelling (RM).

## Paint by signal

A graph can be more or less complex - it can hold one to several features and interactions. For all graphs with over one feature it goes that the graph visualisation at face value does not tell you where the signal arises from. The signal contribution of your features can be uniformly distributed across your features as well as it could be only one feature containing the vast majority of the signal in your graph. To try to surface this information the graph summary evaluates the association of your data to the output at every point in the graph. You have a choice of association. At this time we offer the coloring of the graph nodes to be either by mutual information criterion or by pearson correlation coefficient.

With a greener interaction you see a stronger association - the value of the given association method is also displayed on top of the node. This visualisation allows you to grasp the distribution of signal across your features and **follow the signal** through your graph to also catch whenever a valuable interaction between features arises. The graph summary plot also presents a standard set of metrics - on your train set and optionally on a validation set. Below the code and the graph summary is shown with both train and validation metrics displayed:

```
plot_graph_summary(graph = best_graph,
dataframe = train,
test = test)
```

At input level the highest **signal provider** is the LSTAT feature with a mutual information score of 2.57, whereas the CRIM feature shows up pink with 1.13. The gaussian interaction between LSTAT and CRIM adds a marginal 0.01 to the mutual information score. However, the following gaussian interaction including the multiplication story of CRIM and RM provides an additional 0.14 to ultimately deliver a mutual information score of 2.72. Notice that neither mutual information score or pearson correlation can tell the whole story of signal potential of a specific feature since the QLattice will find patterns first becoming apparent after one or more interactions.

## Partial plotting

Only linear effects can be expressed as single numbers known as coefficients in linear models. The coefficients hold either a negative or positive sign quickly indicating the direction and magnitude of the effect. However, the world is not linear, and this simplifying assumption often makes us more stupid than we really are. We need to allow for non-linear effects. And the QLattice does exactly that. That is all well and good, you say - how do we express non-linear effects then? One approach is the partial plot. With a partial plot you are asking: What is the effect of my feature of interest on the output given a set of fixed values for the remaining set of features in your model.

The standard approach would be to let the mean and mode represent the fixed values of the remainders and let the feature of interest vary. This would result in one line in a graph showing the model's prediction for the "mean" instance of your data. However, the mean instance might now represent your data in a meaningful way which is why in our implementation we have allowed for setting the fixed values of the other model features manually. We will start with the partial plot of the mean instance as indicated in the `fixed`

input to the partial plot call:

```
plot_partial(graph = best_graph,
dataframe = train,
by = 'LSTAT',
fixed = {'CRIM': train.CRIM.mean(), 'RM': train.RM.mean()})
```

The question of **prediction support** is covered by the partial plot. The actual data is shown as a scatter plot in grey accompanied by the associated histograms on the two axis. The potential predictions of the graph are shown by lines for each set of fixed values. As per default the set of fixed values will include the `10%`

, `50%`

, and `90% percentile`

of each numerical fixed feature, while the `three most frequent categories`

will be displayed for categorical features. This can amount to a lot of lines in the plot if you have a high number of features in the model - in that case I propose that you are strict with the fixed values and set them manually and to less than three values for each remainder feature.

Here the default partial plot behavior is shown – the fixed values are shown in the legend:

```
plot_partial(graph = best_graph,
dataframe = train,
by = 'LSTAT')
```

Here it becomes obvious how high CRIM numbers changes the behaviour of the model drastically – high CRIM allows the model to capture the lowest price ranges. Partial plotting is handy to double check your graph behaviour.

## Segment your model fitness

Is there certain parts of my dataset where my model is failing? And consequently, how do I know where to focus my efforts in improving my model? We answer this question with a segmented loss plot. This plot displays the distribution of a feature (or as per default the output variable) in a histogram (or frequency bar plot for categorical features). This histogram is overlayed with the average loss for the associated bin or category.

```
plot_segmented_loss(graph = best_graph,
data = train)
```

From this plot we see that the higher price ranges are the more difficult for the graph to grasp.

The findings from segmented loss plots might lead to tinkering with feature engineering, sample weighting (available in the fit call), and/or outlier alterations.

Let's have a look at the segment plot by LSTAT:

```
plot_segmented_loss(graph = best_graph,
data = train,
by = 'LSTAT')
```

The price is easier predicted for higher LSTAT numbers – except for numbers around 30 that is.