Feyn offers a range of tools to help you dissect your graph and its dynamics.
As sample data we are going for the boston housing price prediction dataset from sklearn where we are predicting median house prices of different areas around Boston. Below I import data, prepare it and find my graph of choice with my QLattice:
from sklearn.datasets import load_boston import pandas as pd from feyn import QLattice from feyn.plots import plot_graph_summary from feyn.tools import split #Download boston housing dataset boston = load_boston() df_boston = pd.DataFrame(boston.data, columns=boston.feature_names) df_boston['PRICE'] = boston.target # Train/test split train, test = split(df_boston) # Connect to QLattice ql = QLattice() # Get a regressor qgraph = ql.get_regressor(train, 'PRICE', max_depth = 2) #max_depth = 2, let's not overdo it qgraph.fit(train) # Select a graph from your fitted QGraph best_graph = qgraph
Segment your model fitness
Is there certain parts of my dataset where my model is failing? And consequently, how do I know where to focus my efforts in improving my model? We answer this question with a segmented loss plot. This plot displays the distribution of a feature (or as per default the output variable) in a histogram (or frequency bar plot for categorical features). This histogram is overlayed with the average loss for the associated bin or category.
best_graph.plot_segmented_loss(data = train)
The findings from segmented loss plots might lead to tinkering with feature engineering, sample weighting (available in the fit call), and/or outlier alterations.
Let's have a look at the segment plot by LSTAT:
best_graph.plot_segmented_loss(data = train,by = 'LSTAT')