Feyn

Feyn

  • Tutorials
  • Guides
  • API Reference
  • FAQ

›Future

Getting Started

  • Quick Start

Using Feyn

  • Introduction to the basic workflow
  • Asking the right questions
  • Formulate hypotheses
  • Analysing and selecting hypotheses
  • What comes next?

Essentials

  • Defining input features
  • Classifiers and Regressors
  • Filtering a QGraph
  • Predicting with a graph
  • Inspection plots
  • Saving and loading graphs
  • Updating your QLattice

Setting up the QLattice

  • Installation
  • Accessing your QLattice
  • Firewalls and proxies
  • QLattice dashboard

Advanced

  • Causal estimation
  • Converting a graph to SymPy
  • Feature importance estimation
  • Setting themes
  • Saving a graph as an image
  • Tuning the fitting process

Future

  • Future package
  • Diagnostics
  • Inspection
  • Reference
  • Stats

Stats

The stats package contains experimental tools to help perform statistical analysis on the models produced from the QLattice. They can be found in __future__.contrib.stats

graph_log_likelihood

def graph_log_likelihood(graph, data):
    """
    This computes the log-likelihood of the graph evaluated on the data set.

    Arguments:
        graph {[feyn.Graph]} -- Graph to evaluate log-likelihood.
        data {[dic of numpy arrays or pandas dataframe]} -- Data to evaluate the log-likelihood on.

    Returns:
        [scalar] -- The log-likelihood of the graph on the data set.
    """

Basic usage:

import feyn
from feyn.__future__.contrib.stats import graph_log_likelihood

ql = feyn.QLattice()
qgraph = ql.get_regressor(data.columns, output)

qgraph.fit(data)
graph = qgraph[0]

loglik = graph_log_likelihood(graph, data)

This is suitable for regressors and classifiers.

graph_f_score

def graph_f_score(graph,data):
    """
    This computes the F-statistic associated to a feyn graph under the null hypothesis.
    The null hypothesis is that every weight on each feature and category is equal to zero.
    
    If the hypothesis is true then the F-score is distributed by F(q, n - p), 
    the Fisher distribution of q and n-p degrees of freedom. Here:
    * q is the amount of weights we assume is equal to zero 
    * n is the amount of samples in data
    * p amount of parameters in the graph. The F score is calculated by:
    nom = {sum((data[target].mean - data[target])**2) - (graph.mse(data) * n)} * (n-p)
    denom = (graph.mse(data) * n) * q
    F = nom / denom

    Arguments:
        graph {[feyn.Graph]} -- Graph to test null hypothesis.
        data {[dic of numpy arrays or pandas dataframe]} -- Data to test significance of graph on.

    Returns:
        tuple -- The F score of hypothesis and p value
    """

Basic usage:

import feyn
from feyn.__future__.contrib.stats import graph_f_score

ql = feyn.QLattice()
qgraph = ql.get_regressor(data.columns, output)

qgraph.fit(data)
graph = qgraph[0]

F, p_value = graph_f_score(graph, data)

This is only suitable for regressors.

graph_g_score

def graph_g_score(graph, data):
    """
    This computes the G-statistic associated to a feyn graph under the null hypothesis.
    The null hypothesis is that every weight on each feature and category is equal to zero.
    
    If the hypothesis is true then the G-score is distributed by chi2(q), 
    with q degrees of freedom. Here:
    * q is the amount of weights we assume is equal to zero 
    
    The G-statistic is calculated by:
    G = 2 * {graph_log_likelihood(graph, data) - log-likelihood of constant model}
    
    where 
    log-likelihood of constant model = #neg_class * np.log(#neg_class) + #pos_class * np.log(#pos_class) - #samples * np.log(#samples)

    Arguments:
        graph {[feyn.Graph]} -- Graph to test null hypothesis.
        data {[dic of numpy arrays or pandas dataframe]} -- Data to test significance of graph on.

    Returns:
        tuple -- The F score of hypothesis and p value
    """

Basic usage:

import feyn
from feyn.__future__.contrib.stats import graph_g_score

ql = feyn.QLattice()
qgraph = ql.get_classifier(data.columns, output)

qgraph.fit(data)
graph = qgraph[0]

G, p_value = graph_g_score(graph, data)

This is only suitable for classifiers.

plot_graph_p_value

def plot_graph_p_value(graph, data, title = 'Significance of graph', ax=None):
    """
    Plots the probability density function under the null hypothesis.     
       
    The null hypothesis is that every weight on each feature and category is equal to zero.

    If the graph is a regression then this plots the Fisher distribution    
    Under the null hypothesis the F-score approximately distributed by F(q, n - p), 
    with q and n-p degrees of freedom. Here:
    * q is the amount of weights we assume is equal to zero 
    * n is the amount of samples in data
    * p amount of parameters in the graph.


    If the graph is a classification then this plots the chi2 distribution    
    Under the null hypothesis the G-score is distributed by chi2(q), 
    with q degrees of freedom. Here:
    * q is the amount of weights we assume is equal to zero 
    
    This also plots vertical lines intercepting the x-axis at the F scores or G scores under each hypothesis.

    Arguments:
        graph {[feyn.Graph]} -- Graph to calculate p-values of under the null hypothesis
        data {[dic of numpy arrays or pandas dataframe]} -- Data to test significance of graph on. 

    Keyword Arguments:
        title {str} -- [Title of axes] (default: {'Significance of graph'})
        ax {[matplotlib.Axes]} -- (default: {None})

    Returns:
        [matplotlibe.Axes] -- Plots of distributions under null hypothesis
    """

Basic usage:

import feyn
from feyn.__future__.contrib.stats import plot_graph_p_value

ql = feyn.QLattice()
qgraph = ql.get_regressor(data.columns, output)

qgraph.fit(data)
graph = qgraph[0]

plot_graph_p_value(graph, data)

This is suitable for regressors and classifiers.

← Reference
  • graph_log_likelihood
  • graph_f_score
  • graph_g_score
  • plot_graph_p_value
Copyright © 2021 Abzu.ai