Feyn offers a range of tools to help you dissect your graph and its dynamics.
As sample data we are going for the boston housing price prediction dataset from sklearn where we are predicting median house prices of different areas around Boston. Below I import data, prepare it and find my graph of choice with my QLattice:
from sklearn.datasets import load_boston import pandas as pd from feyn import QLattice from feyn.tools import split #Download boston housing dataset boston = load_boston() df_boston = pd.DataFrame(boston.data, columns=boston.feature_names) df_boston['PRICE'] = boston.target # Train/test split train, test = split(df_boston) # Connect to QLattice ql = QLattice() ql.reset() # Get a regressor qgraph = ql.get_regressor(train, 'PRICE', max_depth = 2) #max_depth = 2, let's not overdo it qgraph.fit(train) # Select a graph from your fitted QGraph best_graph = qgraph
Paint by signal
A graph can be more or less complex - it can hold one to several features and interactions. For all graphs with over one feature it goes that the graph visualisation at face value does not tell you where the signal arises from. The signal contribution of your features can be uniformly distributed across your features as well as it could be only one feature containing the vast majority of the signal in your graph. To try to surface this information the graph summary evaluates the association of your data to the output at every point in the graph. You have a choice of association. At this time we offer the coloring of the graph nodes to be either by mutual information criterion, pearson correlation coefficient or by Spearman's rank correlation coefficient. The default association is
pearson correlation coefficient.
The green and red colours show whether that interaction is positively or negatively correlated with the target. The more transparent the interaction is the less correlated the interaction is with the target variable. The value of the given correlation method is also displayed on top of the node. This visualisation allows you to grasp the distribution of signal across your features and follow the signal through your graph to also catch whenever a valuable interaction between features arises. The graph summary plot also presents a standard set of metrics - on your train set and optionally on a validation set. Below the code and the graph summary is shown with both train and validation metrics displayed:
best_graph.plot_summary(data = train, test = test)