feyn · Feyn Documentation

Feyn is the main Python module to build and execute models that utilizes a QLattice.

The QLattice stores and updates probabilistic information about the mathematical relationships (models) between observable quantities.

The workflow is typically:

# Instantiate a QLattice
>>> ql = feyn.QLattice()

# Sample models from the QLattice
>>> models = ql.sample_models(data.columsn, output="out")

# Fit the list of models to a dataset
>>> models = feyn.fit_models(models, data)

# Pick the best Model from the fitted models
>>> best = models[0]

# Update the QLattice with this model to explore similar models.
>>> ql.update(best)

# Or use the model to make predictions
>>> predicted_y = model.predict(new_data)

Sub-modules

function fit_models

def fit_models(
    models: List[feyn._model.Model],
    data: pandas.core.frame.DataFrame,
    loss_function: Optional[str] = None,
    criterion: Optional[str] = None,
    n_samples: Optional[int] = None,
    sample_weights: Optional[Iterable[float]] = None,
    threads: int = 4,
    immutable: bool = False
) -> List[feyn._model.Model]

Fit a list of models on some data and return a list of fitted models. The return list will be sorted in ascending order by either the loss function or one of the criteria.

The n_samples parameter controls how many samples are used to train each model. The default behavior is to fit each model once with each sample in the dataset, unless the set is smaller than 10000, in which case the dataset will be upsampled to 10000 samples before fitting.

The samples are shuffled randomly before fitting to avoid issues with the Stochastic Gradient Descent algorithm.

Arguments:
    models {List[feyn.Model]} -- A list of feyn models to be fitted.
    data {[type]} -- Data used in fitting each model.

Keyword Arguments:
    loss_function {Optional[str]} -- The loss function to optimize models for. Can be "squared_error", "absolute_error" or "binary_cross_entropy"
    criterion {Optional[str]} -- Sort by information criterion rather than loss. Either "aic", "bic" or None (loss). (default: {None})
    n_samples {Optional[int]} -- The number of samples to fit each model with. (default: {None})
    sample_weights {Optional[Iterable[float]]} -- An optional numpy array of weights for each sample. If present, the array must have the same size as the data set, i.e. one weight for each sample. (default: {None})
    threads {int} -- Number of concurrent threads to use for fitting. (default: {4})
    immutable {bool} -- If True, create a copy of each model and fit those, leaving the originals untouched. This increases runtime. (default: {False})

Raises:
    TypeError: if inputs don't match the correct type.
    ValueError: if there are no samples
    ValueError: if data and sample_weights is not same size
    ValueError: if the loss function is unknown.
    ValueError: if inputs contain a mix of classifiers and regressors

Returns:
    List[feyn.Model] -- A list of fitted feyn models.

function get_diverse_models

def get_diverse_models(
    models: List[feyn._model.Model],
    n: int = 10
) -> List[feyn._model.Model]

Select at most n best performing models from a collection, such that they are sufficiently diverse in their lineage.

Arguments:
    models {List[feyn.Model]} -- The list of models to find the best ones in.

Keyword Arguments:
    n {int} -- The maximum number of best models to identify. (default: {10})

Raises:
    TypeError: if inputs don't match the correct type.

Returns:
    List[feyn.Model] -- The best sufficiently diverse models under distance_func.

function prune_models

def prune_models(
    models: List[feyn._model.Model],
    keep_n: Optional[int] = None
) -> List[feyn._model.Model]

Prune a list of models to remove redundant and poorly performing ones.

Arguments:
    models {List[feyn.Model]} -- The list of models to prune.

Keyword Arguments:
    keep_n {Optional[int]} -- At most this many models will be returned. If None, models are left to be pruned by other redundancies. (default: {None})

Raises:
    TypeError: if inputs don't match the correct type.

Returns:
    List[feyn.Model] -- The list of pruned models.

function show_model

def show_model(
    model: feyn._model.Model,
    label: Optional[str] = None,
    update_display: bool = False,
    filename: Optional[str] = None,
    show_sources: bool = False
)

Updates the display in a python notebook with the graph representation of a model

Arguments:
    model {Model} -- The model to display.

Keyword Arguments:
    label {Optional[str]} -- A label to add to the rendering of the model (default is None).
    update_display {bool} -- Clear output and rerender figure (defaults to False).
    filename {Optional[str]} -- The filename to use for saving the plot as html (defaults to None).
    show_sources {bool} -- Whether to show the ordering of the sources in the model - for debug purposes (defaults to False).

function validate_data

def validate_data(
    data: pandas.core.frame.DataFrame,
    kind: str,
    output_name: str,
    stypes: Optional[Dict[str, str]] = None
)

Validates a pandas dataframe for known data issues.

Arguments:
    data {pd.DataFrame} -- The data to validate
    kind {str} -- The kind of output - classification or regression
    output_name {str} -- The name of the output

Keyword Arguments:
    stypes {Optional[Dict[str, str]]} -- The stypes you want to assign to your inputs (default: {None})

Raises:
    ValueError: When output values do not match output type
    ValueError: When categorical stypes are not defined for categorical inputs
    ValueError: When nan values exist for numerical inputs

class QLattice

def __init__(
    random_seed: int = -1
) -> QLattice

method QLattice.auto_run

def auto_run(
    self,
    data: pandas.core.frame.DataFrame,
    output_name: str,
    kind: str = 'auto',
    stypes: Optional[Dict[str, str]] = None,
    n_epochs: int = 10,
    threads: Union[int, str] = 'auto',
    max_complexity: int = 10,
    query_string: Optional[str] = None,
    loss_function: Optional[str] = None,
    criterion: Optional[str] = 'bic',
    sample_weights: Optional[Iterable[float]] = None,
    function_names: Optional[List[str]] = None,
    starting_models: Optional[List[feyn._model.Model]] = None
) -> List[feyn._model.Model]

A convenience function for running the QLattice simulator for many epochs. This process can be interrupted with a KeyboardInterrupt, and you will get back the best models that have been found thus far. Roughly equivalent to the following:

>>> priors = feyn.tools.estimate_priors(data, output_name)
>>> ql.update_priors(priors)
>>> models = []
>>> for i in range(n_epochs):
>>>     models += ql.sample_models(data, output_name, kind, stypes, max_complexity, query_string, function_names)
>>>     models = feyn.fit_models(models, data, loss_function, criterion, None, sample_weights)
>>>     models = feyn.prune_models(models)
>>>     ql.update(models)
>>> best = feyn.get_diverse_models(models, n=10)

Arguments:
    data {Iterable} -- The data to train models on. Input names are inferred from the columns (pd.DataFrame) or keys (dict) of this variable.
    output_name {str} -- The name of the output.

Keyword Arguments:
    kind {str} -- Specify the kind of models that are sampled. One of ["auto", "classification", "regression"]. If "auto" is chosen, it will default to "regression" unless the output_name is assigned stype "b", in which case it becomes "classification". (default: {"auto"})
    stypes {Optional[Dict[str, str]]} -- An optional map from input names to semantic types. If None, it will automatically infer the stypes based on the data. (default: {None})
    n_epochs {int} -- Number of training epochs. (default: {10})
    threads {int} -- Number of concurrent threads to use for fitting. If a number, that many threads are used. If "auto", set to your CPU count - 1. (default: {"auto"})
    max_complexity {int} -- The maximum complexity for sampled models. (default: {10})
    query_string {Optional[str]} -- An optional query string for specifying specific model structures. (default: {None})
    loss_function {Optional[Union[str, Callable]]} -- The loss function to optimize models for. If None (default), 'MSE' is chosen for regression problems and 'binary_cross_entropy' for classification problems. (default: {None})
    criterion {Optional[str]} -- Sort by information criterion rather than loss. Either "aic", "bic" or None (loss). (default: {"bic"})
    sample_weights {Optional[Iterable[float]]} -- An optional numpy array of weights for each sample. If present, the array must have the same size as the data set, i.e. one weight for each sample. (default: {None})
    function_names {Optional[List[str]]} -- A list of function names to use in the QLattice simulation. Defaults to all available functions being used. (default: {None})
    starting_models {Optional[List[feyn.Model]]} -- A list of preexisting feyn models you would like to start finding better models from. The inputs and output of these models should match the other arguments to this function. (default: {None})

Raises:
    TypeError: if inputs don't match the correct type.

Returns:
    List[feyn.Model] -- The best models found during this run.

method QLattice.reset

def reset(
    self,
    random_seed=-1
)

Deprecated. Create a new QLattice with the constructor instead.
Clear all learnings in this QLattice.

Keyword Arguments:
    random_seed {int} -- If not -1, seed the qlattice and feyn random number generator to get reproducible results. (default: {-1})

method QLattice.sample_models

def sample_models(
    self,
    input_names: Iterable[str],
    output_name: str,
    kind: str = 'auto',
    stypes: Optional[Dict[str, str]] = None,
    max_complexity: int = 10,
    query_string: Optional[str] = None,
    function_names: Optional[List[str]] = None
) -> List[feyn._model.Model]

Sample models from the QLattice simulator. The QLattice has a probability density for generating different models, and this function samples from that density.

Arguments:
    input_names {List[str]} -- The names of the inputs.
    output_name {str} -- The name of the output.

Keyword Arguments:
    kind {str} -- Specify the kind of models that are sampled. One of ["auto", "classification", "regression"]. If "auto" and no stype is given for the output, it defaults to "regression". (default: {"auto"})
    stypes {Optional[Dict[str, str]]} -- An optional map from input names to semantic types. (default: {None})
    max_complexity {int} -- The maximum complexity for sampled models. Currently the maximum number of edges that the graph representation of the models has. (default: {10})
    query_string {Optional[str]} -- An optional query string for specifying specific model structures. (default: {None})
    function_names {Optional[List[str]]} -- A list of function names to use in the QLattice simulation. Defaults to all available functions being used. (default: {None})

Raises:
    TypeError: if inputs don't match the correct type.
    ValueError: if input_names contains duplicates.
    ValueError: if max_complexity is negative.
    ValueError: if kind is not a regressor or classifier.
    ValueError: if function_names is not recognised.
    ValueError: if query_string is invalid.

Returns:
    List[Model] -- The list of sampled models.

method QLattice.update

def update(
    self,
    models: Iterable[feyn._model.Model]
)

Update QLattice with learnings from a list of models. When updated, the QLattice learns to produce models that are similar to what is included in the update. Without updating, the QLattice will keep generating models with a random structure.

Arguments:
    models {Union[Model, Iterable[Model]]} -- The models to use in a QLattice update.

Raises:
    TypeError: if inputs don't match the correct type.

method QLattice.update_priors

def update_priors(
    self,
    priors: Dict,
    reset: bool = True
)

Update input priors for the QLattice

Keyword Arguments:
    priors - a dictionary of prior probabilities of each input to impact the output.
    reset - a boolean determining whether to reset the current priors, or merge with the existing priors.

class Model

def __init__(
    program,
    fnames,
    params=None
) -> Model

A Model represents a single mathematical equation which can be used for predicting.

The constructor is for internal use.

property Model.depth

depth

Get the depth of the graph representation of the model. In general, it is better to evaluate the complexity of models using the edge_count (or max_complexity) properties

property Model.edge_count

edge_count

Get the total number of edges in the graph representation of this model.

property Model.features

features

Get the name of the input features of the model. Does the same as 'inputs'

property Model.inputs

inputs

Get the name of the input features of the model.

property Model.kind

kind

None

property Model.output

output

Get the name of the output node.

property Model.target

target

Get the name of the output node. Does the same as 'output'

static method Model.is_old_model_version

def is_old_model_version(
    serialized_model: dict
) -> bool

static method Model.load

def load(
    file: Union[~AnyStr, pathlib.Path, TextIO]
) -> 'Model'

Load a `Model` from a file.

Usually used together with `Model.save`.

Arguments:
    file -- A file-like object or a path to load the `Model` from.

Returns:
    Model -- The loaded `Model`-object.

static method Model.load_old_model_version

def load_old_model_version(
    serialized_model: dict
) -> 'Model'

method Model.absolute_error

def absolute_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's absolute error on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.absolute_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method Model.accuracy_score

def accuracy_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's accuracy score on a data set.

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as

(number of correct predictions) / (total number of preditions)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    accuracy score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method Model.accuracy_threshold

def accuracy_threshold(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the accuracy score of predictions with optimal threshold

The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.

This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.

Arguments:
    data {DataFrame} -- Dataset to evaulate accuracy and accuracy threshold

Returns a tuple with:
    threshold that maximizes accuracy
    accuracy score obtained with this threshold

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method Model.binary_cross_entropy

def binary_cross_entropy(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's binary cross entropy on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.binary_cross_entropy(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method Model.copy

def copy(
    self
) -> 'Model'

Return a copy of self.

method Model.depths

def depths(
    self
) -> List[int]

Get the depths of each element in the program.

method Model.find_end

def find_end(
    self,
    ix: int
) -> int

method Model.fit

def fit(
    self,
    data: pandas.core.frame.DataFrame,
    loss_function='squared_error',
    sample_weights=None,
    n_samples=20000
)

Fit this specific `Model` with the given data set.

Arguments:
    data -- Training data including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    loss_function -- Name of the loss function or the function itself. This is the loss function to use for fitting. Can either be a string or one of the functions provided in `feyn.losses`.
    sample_weights -- An optional numpy array of weights for each sample. If present, the array must have the same size as the data set, i.e. one weight for each sample

method Model.get_parameters

def get_parameters(
    self,
    name: str
)

Given a model and the name of one of its input or output nodes,
get a pandas.DataFrame with the associated parameters. If the node
is categorical, the function returns the weight associated with each categorical
value. If the node is numerical, the function returns the scale, weight and
bias.

Arguments:
    name {str} -- Name of the input or output of interest.

Returns:
    pd.DataFrame -- DataFrame with the parameters.

method Model.mae

def mae(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean absolute error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MAE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method Model.mse

def mse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's mean squared error on a data set.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    MSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method Model.plot

def plot(
    self,
    data: pandas.core.frame.DataFrame,
    compare_data: Union[pandas.core.frame.DataFrame, List[pandas.core.frame.DataFrame], NoneType] = None,
    labels: Optional[Iterable[str]] = None,
    filename: Optional[str] = None
) -> 'feyn.tools._display.HTML'

Plot the model's summary metrics and some useful plots for its kind.

This is a shorthand for calling feyn.plots.plot_model_summary.

Arguments:
    data {DataFrame} -- Data set including both input and expected values.

Keyword Arguments:
    compare_data {Optional[Union[DataFrame, List[DataFrame]]]} -- Additional data set(s) including both input and expected values. (default: {None})
    labels {Optional[Iterable[str]]} - A list of labels to use instead of the default labels. Must be size 2 if using comparison dataset, else 1.
    filename {Optional[str]} - The filename to use for saving the plot as html.

Raises:
    TypeError: if inputs don't match the correct type.
    ValueError: If columns needed for the model are not present in the data.

Returns:
    HTML -- HTML report of the model summary.

method Model.plot_confusion_matrix

def plot_confusion_matrix(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = 0.5,
    labels: Optional[Iterable] = None,
    title: str = 'Confusion matrix',
    color_map: str = 'feyn-primary',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Compute and plot a Confusion Matrix.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Boundary of True and False predictions, default 0.5
    labels -- List of labels to index the matrix
    title -- Title of the plot.
    color_map -- Color map from matplotlib to use for the matrix
    ax -- matplotlib axes object to draw to, default None
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method Model.plot_flow

def plot_flow(
    self,
    data: pandas.core.frame.DataFrame,
    sample: Union[pandas.core.frame.DataFrame, pandas.core.series.Series],
    filename: Optional[str] = None
) -> 'feyn.tools._display.SVG'

Plots the flow of activations through the model, for the provided sample. Uses the provided data as background information for visualization.
Arguments:
    data {DataFrame} -- Data set including both input and expected values.
    sample {Union[DataFrame, Series]} -- A single data sample to plot the activations for.
    filename {Optional[str]} - The filename to use for saving the plot as svg.

Raises:
    TypeError: if inputs don't match the correct type.
    ValueError: If columns needed for the model are not present in the data.

Returns:
    SVG -- SVG object containing the SVG of the model activation flow.

method Model.plot_pr_curve

def plot_pr_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'Precision-Recall curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's precision-recall curve.

This is a shorthand for calling feyn.plots.plot_pr_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the PR curve of the precision and recall at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method Model.plot_probability_scores

def plot_probability_scores(
    self,
    data: pandas.core.frame.DataFrame,
    nbins: int = 10,
    title: str = 'Predicted Probabilities',
    legend: List[str] = ['Positive Class', 'Negative Class'],
    legend_loc: Optional[str] = 'upper center',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
)

Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
    nbins {int} -- number of bins (default: {10})
    title {str} -- plot title (default: {''})
    legend {List[str]} -- legend to use on the plot for the positive and negative class (default: ["Positive Class", "Negative Class"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax {Axes} -- axes object (default: {None})
    figsize {tuple} -- size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
    kwargs {dict} -- histogram kwargs (default: {None})

Raises:
    TypeError -- if model is not a classification model.
    TypeError -- if inputs don't match the correct type.
    ValueError: if y_true is not bool-like (boolean or 0/1).
    ValueError: if y_pred is not bool-like (boolean or 0/1).
    ValueError: if y_pred and y_true are not same size.
    ValueError: If fewer than two labels are supplied for the legend.

method Model.plot_regression

def plot_regression(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Actuals vs Prediction',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x

Arguments:
    data {DataFrame} -- The dataset to determine regression quality. It contains input names and output name of the model as columns

Keyword Arguments:
    title {str} -- (default: {"Actuals vs Predictions"})
    ax {AxesSubplot} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method Model.plot_residuals

def plot_residuals(
    self,
    data: pandas.core.frame.DataFrame,
    title: str = 'Residuals plot',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
)

This plots the predicted values against the residuals (y_true - y_pred).

Arguments:
    data {DataFrame} -- The dataset containing the samples to determine the residuals of.

Keyword Arguments:
    title {str} -- (default: {"Residual plot"})
    ax {Axes} -- (default: {None})
    figsize {tuple} -- Size of figure (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError -- if inputs don't match the correct type.

method Model.plot_response_1d

def plot_response_1d(
    self,
    data: pandas.core.frame.DataFrame,
    by: Optional[str] = None,
    input_constraints: Optional[dict] = None,
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: tuple = (8, 8),
    filename: Optional[str] = None
) -> None

Plot the response of a model to a single input given by `by`.
The remaining model inputs are fixed by default as the middle
quantile (median). Additional quantiles are added if the model has
a maximum of 3 inputs. You can change this behavior by determining
`input_contraints` yourself. Any number of model inputs can be added to it.

Arguments:
    data {DataFrame} -- The dataset to plot on.
    by {str} -- Model input to plot model response by.

Keyword Arguments:
    input_contraints {Optional[dict]} -- Input values to be fixed (default: {None}).
    ax {Optional[matplotlib.axes]} -- matplotlib axes object to draw to (default: {None}).
    figsize {tuple} -- size of created figure (default: {(8,8)})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError: if function parameters don't match the correct type.
    ValueError: if by is not in the columns of data or inputs to the model.
    ValueError: if by is also in input_constraints.
    ValueError: if input_constraints contains an input that is not in data.
    ValueError: if model.output is not in data.

method Model.plot_response_2d

def plot_response_2d(
    self,
    data: pandas.core.frame.DataFrame,
    fixed: Optional[Dict[str, Any]] = None,
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    resolution: int = 1000,
    cmap: str = 'feyn-diverging',
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Visualize the response of a model to numerical inputs. Works for both classification and regression problems. The plot comes in two parts:

1. A colored background indicating the response of the model in a 2D space given the fixed values. A lighter color corresponds to a bigger output from the model.
2. Scatter-plotted data on top of the background. In a classification scenario, green corresponds to positive class, and pink corresponds to the negative class. For regression, the color gradient shows the true distribution of the output value. Two sizes are used in the scatterplot, the larger dots correspond to the data that matches the values in fixed and the smaller ones have data different from the values in fixed.

Arguments:
    model {feyn.Model} -- The feyn Model we want a partial plot of.
    data {DataFrame} -- The data that will be scattered in the model.

Keyword Arguments:
    fixed {Optional[Dict[str, Any]]} -- Dictionary with values we fix in the model. The key is an input name in the model and the value is a number that the input is fixed to. (default: {None})
    ax {Optional[plt.Axes.axes]} -- Optional matplotlib axes in which to make the partial plot. (default: {None})
    resolution {int} -- The resolution at which we sample the 2D input space for the background. (default: {1000})
    figsize {Optional[tuple]} -- Size of created figure if no matplotlib axes is passed in ax. (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError: if function parameters don't match the correct type.
    ValueError: if the model input names minus the fixed value names are more than two, meaning that you need to fix more values to reduce the dimensionality and make a 2D plot possible.
    ValueError: if fixed contains an input not in the model inputs.
    ValueError: If columns needed for the model are not present in the data.

method Model.plot_response_auto

def plot_response_auto(
    self,
    data: pandas.core.frame.DataFrame,
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Automatically visualize the response of a model to numerical inputs.

This function attempts to automatically determine the most interesting inputs to display and fixes the rest to the median if numeric or mode if categorical.
It also automatically decided whether to plot a 1d or 2d response plot.

Uses the functions `plot_model_response_1d` or `plot_model_response_2d` internally depending on number of inputs in the model.

For the 2D plot, the following applies:
1. A colored background indicating the response of the model in a 2D space given the fixed values. A lighter color corresponds to a bigger output from the model.
2. Scatter-plotted data on top of the background. In a classification scenario, green corresponds to positive class, and pink corresponds to the negative class. For regression, the color gradient shows the true distribution of the output value. Two sizes are used in the scatterplot, the larger dots correspond to the data that matches the values in fixed and the smaller ones have data different from the values in fixed.

Arguments:
    model {feyn.Model} -- The feyn Model we want a partial plot of.
    data {DataFrame} -- The data that will be scattered in the model.

Keyword Arguments:
    fixed {Optional[Dict[str, Any]]} -- Dictionary with values we fix in the model. The key is an input name in the model and the value is a number that the input is fixed to. (default: {None})
    ax {Optional[plt.Axes.axes]} -- Optional matplotlib axes in which to make the partial plot. (default: {None})
    resolution {int} -- The resolution at which we sample the 2D input space for the background. (default: {1000})
    figsize {Optional[tuple]} -- Size of created figure if no matplotlib axes is passed in ax. (default: {None})
    filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})

Raises:
    TypeError: if inputs don't match the correct type.

method Model.plot_roc_curve

def plot_roc_curve(
    self,
    data: pandas.core.frame.DataFrame,
    threshold: Optional[float] = None,
    title: str = 'ROC curve',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None,
    **kwargs
) -> None

Plot the model's ROC curve.

This is a shorthand for calling feyn.plots.plot_roc_curve.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
    threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
    title -- Title of the plot.
    ax -- matplotlib axes object to draw to, default None
    figsize -- size of figure when  is None, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
    **kwargs -- additional keyword arguments to pass to Axes.plot function

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method Model.plot_segmented_loss

def plot_segmented_loss(
    self,
    data: pandas.core.frame.DataFrame,
    by: Optional[str] = None,
    loss_function: str = 'squared_error',
    title: str = 'Segmented Loss',
    legend: List[str] = ['Samples in bin', 'Mean loss for bin'],
    legend_loc: Optional[str] = 'lower right',
    ax: Optional[matplotlib.axes._axes.Axes] = None,
    figsize: Optional[tuple] = None,
    filename: Optional[str] = None
) -> None

Plot the loss by segment of a dataset.

This plot is useful to evaluate how a model performs on different subsets of the data.

Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")

This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.

You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.

Arguments:
    data {DataFrame} -- The dataset to measure the loss on.

Keyword Arguments:
    by -- The column in the dataset to segment by.
    loss_function -- The loss function to compute for each segmnent,
    title -- Title of the plot.
    legend {List[str]} -- legend to use on the plot for bins and loss line (default: ["Samples in bin", "Mean loss for bin"])
    legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
    ax -- matplotlib axes object to draw to
    figsize -- Size of created figure, default None
    filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None

Raises:
    TypeError -- if inputs don't match the correct type.
    ValueError: if by is not in data.
    ValueError: If columns needed for the model are not present in the data.
    ValueError: If fewer than two labels are supplied for the legend.

method Model.plot_signal

def plot_signal(
    self,
    data: pandas.core.frame.DataFrame,
    corr_func: Optional[str] = None,
    filename: Optional[str] = None
)

Plot a model displaying the signal path for the provided feyn.Model and DataFrame.

Arguments:
    dataframe {DataFrame} -- A Pandas DataFrame for showing metrics.

Keyword Arguments:
    corr_func {Optional[str]} -- A name for the correlation function to use as the node signal, either 'mutual_information', 'pearson' or 'spearman' are available. (default: {None} defaults to 'pearson')
    filename {Optional[str]} - The filename to use for saving the plot as svg.

Raises:
    TypeError: if function parameters don't match the correct type.
    ValueError: if the name of the correlation function is not understood.
    ValueError: if invalid dataframes are passed.
    ValueError: If columns needed for the model are not present in the data.

Returns:
    SVG -- SVG of the model signal.

method Model.predict

def predict(
    self,
    X: pandas.core.frame.DataFrame
) -> numpy.ndarray

Calculate predictions based on input values. Note that for classification tasks the output are probabilities.

>>> model.predict({ "age": [34, 78], "sex": ["male", "female"] })
[0.85, 0.21]

Arguments:
    X {DataFrame} -- The input values as a pandas.DataFrame.

Returns:
    np.ndarray -- The calculated predictions.

method Model.r2_score

def r2_score(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's r2 score on a data set

The r2 score for a regression model is defined as
1 - rss/tss

Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.

A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value

It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    r2 score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method Model.rmse

def rmse(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's root mean squared error on a data set.

Arguments:
    data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    RMSE for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.

method Model.roc_auc_score

def roc_auc_score(
    self,
    data: pandas.core.frame.DataFrame
)

Calculate the Area Under Curve (AUC) of the ROC curve.

A ROC curve depicts the ability of a binary classifier with varying threshold.

The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    AUC score for the predictions

Raises:
    TypeError -- if inputs don't match the correct type.
    TypeError -- if model is not a classification model.

method Model.save

def save(
    self,
    file: Union[~AnyStr, pathlib.Path, TextIO]
) -> None

Save the `Model` to a file-like object.

The file can later be used to recreate the `Model` with `Model.load`.

Arguments:
    file -- A file-like object or path to save the model to.

method Model.savefig

def savefig(
    self,
    filename: str
) -> str

Save model as an svg file.

Args:
    filename (str): the filename of the file to save. Includes the filepath and file extension.

method Model.show

def show(
    self,
    label: Optional[str] = None,
    update_display: bool = False,
    filename: Optional[str] = None
)

Updates the display in a python notebook with the graph representation of a model

Keyword Arguments:
    label {Optional[str]} -- A label to add to the rendering of the model (default is None).
    update_display {bool} -- Clear output and rerender figure (defaults to False).
    filename {Optional[str]} -- The filename to use for saving the plot as html (defaults to None).

method Model.squared_error

def squared_error(
    self,
    data: pandas.core.frame.DataFrame
)

Compute the model's squared error loss on the provided data.

This function is a shorthand that is equivalent to the following code:
> y_true = data[]
> y_pred = model.predict(data)
> se = feyn.losses.squared_error(y_true, y_pred)

Arguments:
    data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.

Returns:
    nd.array -- The losses as an array of floats.

Raises:
    TypeError -- if inputs don't match the correct type.

method Model.sympify

def sympify(
    self,
    signif: int = 6,
    symbolic_lr=False,
    symbolic_cat=True,
    include_weights=True
)

Convert the model to a sympy expression.
This function requires sympy to be installed.

Arguments:
    signif -- the number of significant digits in the parameters of the model
    symbolic_lr -- express logistic regression wrapper as part of the expression

Returns:
    expression -- a sympy expression

method Model.to_query_string

def to_query_string(
    self
)

Returns the query string representation for the given model.

class Theme

def __init__(
    /,
    *args,
    **kwargs
) -> Theme

static method Theme.cmap

def cmap(
    name
)

Helper to get a colormap from the current theme
Arguments:
    name {str} -- A colormap from the theme

Returns:
    matplotlib.colors.LinearSegmentedColormap

static method Theme.color

def color(
    color
)

Helper to get a color from the current theme

Arguments:
    color {str} -- A color from the theme, either among:
    ['highlight', 'primary', 'secondary', 'accent', 'light', 'dark', 'neutral']
    a color among AbzuColors,
    Colors defined in the theme,
    Hex code (will pass through if not defined)

Returns:
    [str] -- a string with a color hex code, i.e. '#FF1EC8'

static method Theme.cycler

def cycler(
    pos: Optional[int] = None
) -> Union[str, List[str]]

Helper to get a color from the current theme's cycler

Arguments:
    pos {Optional[str]} -- position in the cycler for the color to return. Will return a list of colors if None {default: None}

Returns:
    [str] -- a string with a color hex code, i.e. '#FF1EC8'

static method Theme.flip_cmap

def flip_cmap(
    cmap: str
)

Reverse the specified colormap belonging to a theme. Any subsequent plots using the colormap will now use the reversed version instead.
Flipping it multiple times or changing the theme will revert it back to its original order.

static method Theme.flip_diverging_cmap

def flip_diverging_cmap(
    
)

Reverse the gradient used in the feyn-diverging colormap. Any subsequent plots using the colormap will now use the reversed version instead.
Useful for flipping the color scheme for classification tasks where 1 is a negative outcome and 0 is a positive outcome.
Flipping it multiple times or changing the theme will revert it back to its original order.

It does not affect the order of the gradient returned by Theme.gradient.

static method Theme.font_size

def font_size(
    size
)

Helper to get a font size in pixels from a t-shirt size definition such as:
    ['small', 'medium', 'large']

Arguments:
    size {str} -- A size in t-shirt sizing

Returns:
    int -- font size in pixels corresponding to the provided t-shirt size

static method Theme.gradient

def gradient(
    
)

Helper to get a three-step gradient from the current theme

Returns:
    [Array(str)] -- An array of diverging colors [, , ]

static method Theme.set_theme

def set_theme(
    theme='default'
)

Sets the theme for visual output in Feyn.

Arguments:
    theme {str} -- Choose amongst: ['default', 'light', 'dark', 'mono', 'mono_dark']

Sub-modules

function fit_models

function get_diverse_models

function prune_models

function show_model

function validate_data

class QLattice

method QLattice.auto_run

method QLattice.reset

method QLattice.sample_models

method QLattice.update

method QLattice.update_priors

class Model

property Model.depth

property Model.edge_count

property Model.features

property Model.inputs

property Model.kind

property Model.output

property Model.target

static method Model.is_old_model_version

static method Model.load

static method Model.load_old_model_version

method Model.absolute_error

method Model.accuracy_score

method Model.accuracy_threshold

method Model.binary_cross_entropy

method Model.copy

method Model.depths

method Model.find_end

method Model.fit

method Model.get_parameters

method Model.mae

method Model.mse

method Model.plot

method Model.plot_confusion_matrix

method Model.plot_flow

method Model.plot_pr_curve

method Model.plot_probability_scores

method Model.plot_regression

method Model.plot_residuals

method Model.plot_response_1d

method Model.plot_response_2d

method Model.plot_response_auto

method Model.plot_roc_curve

method Model.plot_segmented_loss

method Model.plot_signal

method Model.predict

method Model.r2_score

method Model.rmse

method Model.roc_auc_score

method Model.save

method Model.savefig

method Model.show

method Model.squared_error

method Model.sympify

method Model.to_query_string

class Theme

static method Theme.cmap

static method Theme.color

static method Theme.cycler

static method Theme.flip_cmap

static method Theme.flip_diverging_cmap

static method Theme.font_size

static method Theme.gradient

static method Theme.set_theme

Subscribe to get news about Feyn and the QLattice.