Feyn

Feyn

  • Tutorials
  • Guides
  • API Reference
  • FAQ

›Advanced

Getting Started

  • Quick Start

Using Feyn

  • Introduction to the basic workflow
  • Asking the right questions
  • Formulate hypotheses
  • Analysing and selecting hypotheses
  • What comes next?

Essentials

  • Defining input features
  • Classifiers and Regressors
  • Filtering a QGraph
  • Predicting with a graph
  • Saving and loading graphs
  • Updating your QLattice

Plotting

  • Graph summary
  • Partial plots
  • Segmented loss
  • Goodness of fit
  • Residuals plot

Setting up the QLattice

  • Installation
  • Accessing your QLattice
  • Firewalls and proxies
  • QLattice dashboard

Advanced

  • Causal estimation
  • Converting a graph to SymPy
  • Feature importance estimation
  • Setting themes
  • Saving a graph as an image
  • Tuning the fitting process

Future

  • Future package
  • Diagnostics
  • Inspection
  • Reference
  • Stats
  • Plots

Converting a graph to SymPy

by: Kevin Broløs
(Feyn version 1.4.6 or newer)

We use SymPy to convert our graphs to symbolic mathematical expressions. This means that you can convert our graphs to SymPy objects for further manipulation or processing, execute them as equations, print them in LaTeX, or just implement the graph in any environment following the equation.

The only limitation is that we don't currently support exporting categorical registers to an executable function, so for graphs with categories, you'll only have an expression for understanding purposes.

Let's first generate a dataset and initialize a QLattice

We'll start with generating a dataset using sklearn.

from feyn import QLattice

from sklearn.datasets import make_classification
import pandas as pd

# Generate a dataset and put it into a dataframe
X, y = make_classification()
data = pd.DataFrame(X, columns=['abcdefghijklmnopqrstuvwxyz'[i] for i in range(X.shape[1])])
data['target'] = y

qlattice = QLattice()

Fit the QGraph real quick

We won't care about train/test splits or exploring relations, so we'll just fit the QGraph once, and convert it to a mathematical expression.

qgraph = qlattice.get_classifier(data, 'target', max_depth=1)
qgraph.fit(data)

# Take the top graph
graph = qgraph[0]
graph

example graph

Now that we have a graph, let's stop and reflect over what this represents.

This graph takes the inputs f and m, adds them together, and finally applies a sigmoid function on the result to force it between 0 and 1 for classification.

Convert to SymPy

Let's see how the SymPy expression looks converted to LaTeX:

sympy_graph = graph.sympify(symbolic_lr=True)

sympy_graph.as_expr()

10.878142e−0.698784f−5.97639m+1\displaystyle \frac{1}{0.878142 e^{- 0.698784 f - 5.97639 m} + 1}0.878142e−0.698784f−5.97639m+11​

We can very clearly see the sigmoid function represented here: f(x)=11+e−x\displaystyle f(x) = \frac{1}{1 + e^{-x}}f(x)=1+e−x1​

where x=(0.698784f+5.97639m)x = (0.698784 f + 5.97639 m)x=(0.698784f+5.97639m). The sigmoid further has a factor of 0.8781420.8781420.878142. The numbers can be a little tricky, since they're automatically simplified by SymPy, but they're based off of the scale, weight and bias of the inputs and outputs.

Let's take a look at a simpler example so we can deduce this more clearly from the graph:

Let's try it for a regressor

Eventhough it's a classification dataset, let's just try to regress on it by using a different target. Let's do feature a, since we know that one carries signal, due to the composition of sklearns make_classification.

qgraph = qlattice.get_regressor(data, 'a', max_depth=1)
qgraph.fit(data)

# Take the top graph
graph = qgraph[0]
graph

example graph

The regressor uses a linear cell as the output, so it'll look different from our classification result.

Let's see how the SymPy expression looks converted to LaTeX:

sympy_graph = graph.sympify(symbolic_lr=True)

sympy_graph.as_expr()

4.87371e−7.6721(0.484426−target)2−4.12931(h+0.343357)2−0.0929986\displaystyle 4.87371 e^{- 7.6721 \left(0.484426 - target\right)^{2} - 4.12931 \left(h + 0.343357\right)^{2}} - 0.09299864.87371e−7.6721(0.484426−target)2−4.12931(h+0.343357)2−0.0929986

Again, you can see the features represented, in this case target and h. It goes into a two-legged gaussian with the formulation e−(x02+x12)e^{-(x0^2+x1^2)}e−(x02+x12), where x0 and x1 each are inputs with learned linear scaling of scale∗weight∗x+biasscale*weight*x + biasscale∗weight∗x+bias. The output is similarly scaled, and the factor 4.87371 accounts for the weight and scale, and the bias is represented as the last term -0.0929986.

To get a little into where we get these weights and biases from, let's inspect the output node of the graph manually:

graph[-1].state._to_dict()
{'scale': 2.528051572591156,
 'w': 1.9278497183870758,
 'bias': -0.09299855246385323}

You'll see that the factor 4.87371 is equal to scale∗weightscale*weightscale∗weight, and the bias is the last term, -0.0929986.

The process is similar for the weights and biases on the inputs.

What to do with the SymPy object

You can check out their documentation on how to use it if you don't already know how. It works automatically by pretty printing in unicode terminals and IPython environments.

You can also use this for portability of final Feyn graphs, as you don't need the Python runtime to execute a simple mathematical equation. So you can take the output of these functions and port to R, STATA, JavaScript, or even Excel if you want to.

Significant digits

Keep in mind that more complex graphs may require more significant digits to stay accurate. You can adjust that by using the signif parameter on the function

graph.sympify(signif=10, symbolic_lr=True)

4.873705048e−7.67210257(0.4844257382−target)2−4.129305744(h+0.3433571115)2−0.09299855\displaystyle 4.873705048 e^{- 7.67210257 \left(0.4844257382 - target\right)^{2} - 4.129305744 \left(h + 0.3433571115\right)^{2}} - 0.092998554.873705048e−7.67210257(0.4844257382−target)2−4.129305744(h+0.3433571115)2−0.09299855

will return 10 significant digits for each variable, for instance.

← Causal estimationFeature importance estimation →
  • Let's first generate a dataset and initialize a QLattice
  • Fit the QGraph real quick
  • Convert to SymPy
  • Let's try it for a regressor
  • What to do with the SymPy object
  • Significant digits
Copyright © 2021 Abzu.ai
Feyn®, QGraph®, and the QLattice® are registered trademarks of Abzu®