# Partial plots

by: Valdemar Stentoft-Hansen and Chris Cave

(Feyn version 1.5 or newer)

`Feyn`

offers a range of tools to help you dissect your graph and its dynamics.

As sample data we are going for the boston housing price prediction dataset from sklearn where we are predicting median house prices of different areas around Boston. Below I import data, prepare it and find my graph of choice with my QLattice:

```
from sklearn.datasets import load_boston
import pandas as pd
from feyn import QLattice
from feyn.plots import plot_graph_summary
from feyn.tools import split
#Download boston housing dataset
boston = load_boston()
df_boston = pd.DataFrame(boston.data, columns=boston.feature_names)
df_boston['PRICE'] = boston.target
# Train/test split
train, test = split(df_boston)
# Connect to QLattice
ql = QLattice()
# Get a regressor
qgraph = ql.get_regressor(train, 'PRICE', max_depth = 2) #max_depth = 2, let's not overdo it
qgraph.fit(train)
# Select a graph from your fitted QGraph
best_graph = qgraph[0]
```

## Partial plotting

Only linear effects can be expressed as single numbers known as coefficients in linear models. The coefficients hold either a negative or positive sign quickly indicating the direction and magnitude of the effect. However, the world is not linear, and this simplifying assumption often makes us more stupid than we really are. We need to allow for non-linear effects. And the QLattice does exactly that. That is all well and good, you say - how do we express non-linear effects then? One approach is the partial plot. With a partial plot you are asking: What is the effect of my feature of interest on the output given a set of fixed values for the remaining set of features in your model.

The standard approach would be to let the mean and mode represent the fixed values of the remainders and let the feature of interest vary. This would result in one line in a graph showing the model's prediction for the "mean" instance of your data. However, the mean instance might now represent your data in a meaningful way which is why in our implementation we have allowed for setting the fixed values of the other model features manually. We will start with the partial plot of the mean instance as indicated in the `fixed`

input to the partial plot call:

```
plot_partial(graph = best_graph,
dataframe = train,
by = 'LSTAT',
fixed = {'CRIM': train.CRIM.mean(), 'RM': train.RM.mean()})
```

The question of **prediction support** is covered by the partial plot. The actual data is shown as a scatter plot in grey accompanied by the associated histograms on the two axis. The potential predictions of the graph are shown by lines for each set of fixed values. As per default the set of fixed values will include the `10%`

, `50%`

, and `90% percentile`

of each numerical fixed feature, while the `three most frequent categories`

will be displayed for categorical features. This can amount to a lot of lines in the plot if you have a high number of features in the model - in that case I propose that you are strict with the fixed values and set them manually and to less than three values for each remainder feature.

Here the default partial plot behavior is shown – the fixed values are shown in the legend:

```
plot_partial(graph = best_graph,
dataframe = train,
by = 'LSTAT')
```