# feyn

```
Feyn is the main Python module to build and execute models that utilizes a QLattice.
The QLattice stores and updates probabilistic information about the mathematical relationships (models) between observable quantities.
The workflow is typically:
# Connect to the QLattice
>>> ql = feyn.connect_qlattice()
# Extract models from the QLattice
>>> models = ql.sample_models(data.columsn, output="out")
# Fit the list of models to a local dataset
>>> models = feyn.fit_models(models, data)
# Pick the best Model from the fitted models
>>> best = models[0]
# Update the remote QLattice with this model to explore similar models.
>>> ql.update(best)
# Or use the model to make predictions
>>> predicted_y = model.predict(new_data)
```

## Sub-modules

- feyn.criteria
- feyn.datasets
- feyn.filters
- feyn.inference
- feyn.insights
- feyn.losses
- feyn.metrics
- feyn.plots
- feyn.reference
- feyn.tools

*function* best_diverse_models

```
def best_diverse_models(
models: List[feyn._model.Model],
n: int = 10,
distance_func: Union[Callable[[feyn._model.Model, feyn._model.Model], bool], NoneType] = None
) -> List[feyn._model.Model]
```

```
Separate the n best performing models from a collection, such that they are sufficiently diverse in the context of some distance function.
Arguments:
models {List[feyn.Model]} -- The list of models to find the best ones in.
Keyword Arguments:
n {int} -- The maximum number of best models to identify. (default: {10})
distance_func {Optional[Callable[[feyn.Model, feyn.Model], bool]]} -- Function to calculate model distance with. If the return is False, the model in the first argument is not sufficiently distant and not considered. If no function is specified, this defaults to being sufficiently distant in the QLattice. (default: {None})
Returns:
List[feyn.Model] -- The best sufficiently diverse models under distance_func.
```

*function* connect_qlattice

```
def connect_qlattice(
qlattice: Union[str, NoneType] = None,
api_token: Union[str, NoneType] = None,
server: str = 'https://ql.abzu.ai',
config: Union[str, NoneType] = None
) -> feyn._qlattice.QLattice
```

```
Utility function for connecting to a QLattice. A QLattice (short for Quantum Lattice) is a device which can be used to generate and explore a vast number of models linking a set of input observations to an output prediction. The actual QLattice runs on a dedicated computing cluster which is operated by Abzu. The `feyn.QLattice` class provides a client interface to communicate with, sample models from, and update the QLattice.
Keyword Arguments:
qlattice {Optional[str]} -- The qlattice you want to connect to, such as: `a1b2c3d4`. (Should not to be used in combination with the config parameter). (default: {None})
api_token {Optional[str]} -- Authentication token for the communicating with this QLattice. (Should not to be used in combination with the config parameter). (default: {None})
server {str} -- The server hosting your QLattice. (Should not to be used in combination with the config parameter). (default: {DEFAULT_SERVER})
config {Optional[str]} -- The configuration setting in your feyn.ini or .feynrc file to load the url and api_token from. These files should be located in your home folder. (default: {None})
Returns:
QLattice -- The QLattice connection handler to your remote QLattice.
```

*function* fit_models

```
def fit_models(
models: List[feyn._model.Model],
data,
loss_function: Union[str, Callable] = 'squared_error',
criterion: Union[str, NoneType] = None,
n_samples: Union[int, NoneType] = None,
sample_weights: Union[Iterable[float], NoneType] = None,
threads: int = 4,
immutable: bool = False
) -> List[feyn._model.Model]
```

```
Fit a list of models on some data and return a list of fitted models. The return list will be sorted in ascending order by either the loss function or one of the criteria.
The n_samples parameter controls how many samples are used to train each model. The default behavior is to fit each model once with each sample in the dataset, unless the set is smaller than 10000, in which case the dataset will be upsampled to 10000 samples before fitting.
The samples are shuffled randomly before fitting to avoid issues with the Stochastic Gradient Descent algorithm.
Arguments:
models {List[feyn.Model]} -- A list of feyn models to be fitted.
data {[type]} -- Data used in fitting each model.
Keyword Arguments:
loss_function {Union[str, Callable]} -- The loss function to optimize models for. Can take any loss function in `feyn.losses`. (default: {"squared_error"} (MSE))
criterion {Optional[str]} -- Sort by information criterion rather than loss. Either "aic", "bic" or None. (default: {None})
n_samples {Optional[int]} -- The number of samples to fit each model with. (default: {None})
sample_weights {Optional[Iterable[float]]} -- An optional numpy array of weights for each sample. If present, the array must have the same size as the data set, i.e. one weight for each sample. (default: {None})
threads {int} -- Number of concurrent threads to use for fitting. (default: {4})
immutable {bool} -- If True, create a copy of each model and fit those, leaving the originals untouched. This increases runtime. (default: {False})
Raises:
ValueError: 0 samples for fitting, sizes of data and sample weights don't match, invalid loss function.
Returns:
List[feyn.Model] -- A list of fitted feyn models.
```

*function* prune_models

```
def prune_models(
models: List[feyn._model.Model],
dropout: bool = True,
decay: bool = True,
keep_n: Union[int, NoneType] = None
) -> List[feyn._model.Model]
```

```
Prune a list of models to remove redunant and poorly performing ones.
Arguments:
models {List[feyn.Model]} -- The list of models to prune.
Keyword Arguments:
dropout {bool} -- Whether or not to implement dropout regularization based on where in the QLattice models were generated. (default: {True})
decay {bool} -- Whether or not to implement decay of old models that have not lived up to their potential. (default: {True})
keep_n {Optional[int]} -- At most this many models will be returned. If None, models are left to be pruned by other redundancies. (default: {None})
Returns:
List[feyn.Model] -- The list of pruned models.
```

*function* show_model

```
def show_model(
model: feyn._model.Model,
label: Union[str, NoneType] = None,
update_display: bool = False
)
```

```
Updates the display in a python notebook with the graph representation of a model
Arguments:
model {Model} -- The model to display.
Keyword Arguments:
label {Optional[str]} -- A label to add to the rendering of the model (default: {""})
```

*class* Model

```
def __init__(
size: int
) -> Model
```

```
A Model represents a single mathematical equation which can be used for predicting.
The constructor is for internal use.
```

*property* Model.edge_count

```
edge_count
```

```
Get the total number of edges in this model.
```

*property* Model.depth

```
depth
```

```
Get the depth of the graph representation of the model.
```

*property* Model.edges

```
edges
```

```
Get the total number of edges in the graph representation of this model.
```

*property* Model.features

```
features
```

```
Get the name of the input features of the model. Does the same as 'inputs'
```

*property* Model.inputs

```
inputs
```

```
Get the name of the input features of the model.
```

*property* Model.output

```
output
```

```
Get the name of the output node.
```

*property* Model.target

```
target
```

```
Get the name of the output node. Does the same as 'output'
```

*static method* Model.load

```
def load(
file: Union[~AnyStr, pathlib.Path, TextIO]
) -> 'Model'
```

```
Load a `Model` from a file.
Usually used together with `Model.save`.
Arguments:
file -- A file-like object or a path to load the `Model` from.
Returns:
Model -- The loaded `Model`-object.
```

*method* Model.absolute_error

```
def absolute_error(
self,
data: Iterable
)
```

```
Compute the model's absolute error on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* Model.accuracy_score

```
def accuracy_score(
self,
data: Iterable
)
```

```
Compute the model's accuracy score on a data set.
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as
(number of correct predictions) / (total number of preditions)
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
accuracy score for the predictions
```

*method* Model.accuracy_threshold

```
def accuracy_threshold(
self,
data: Iterable
)
```

```
Compute the accuracy score of predictions with optimal threshold
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.
This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.
Arguments:
true -- Expected values
pred -- Predicted values
Returns a tuple with:
threshold that maximizes accuracy
accuracy score obtained with this threshold
```

*method* Model.binary_cross_entropy

```
def binary_cross_entropy(
self,
data: Iterable
)
```

```
Compute the model's binary cross entropy on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* Model.fit

```
def fit(
self,
data,
loss_function='squared_error',
sample_weights=None
)
```

```
Fit this specific `Model` with the given data set.
Arguments:
data -- Training data including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
loss_function -- Name of the loss function or the function itself. This is the loss function to use for fitting. Can either be a string or one of the functions provided in `feyn.losses`.
sample_weights -- An optional numpy array of weights for each sample. If present, the array must have the same size as the data set, i.e. one weight for each sample
```

*method* Model.mae

```
def mae(
self,
data
)
```

```
Compute the model's mean absolute error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MAE for the predictions
```

*method* Model.mse

```
def mse(
self,
data
)
```

```
Compute the model's mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MSE for the predictions
```

*method* Model.plot

```
def plot(
self,
data: Iterable,
test: Union[Iterable, NoneType] = None,
corr_func: Union[str, NoneType] = None
) -> 'SVG'
```

```
Plot the model's summary metrics and show the signal path.
This is a shorthand for calling feyn.plots.plot_model_summary.
Arguments:
data {Iterable} -- Data set including both input and expected values. Must be a pandas.DataFrame or dict of numpy arrays.
Keyword Arguments:
test {Optional[Iterable]} -- Additional data set including both input and expected values. Must be a pandas.DataFrame. (default: {None})
corr_func {Optional[str]} -- Correlation function to use in showing the importance of individual nodes. Must be either "mutual information", "pearson" or "spearmans". (default: {None} -> "pearson")
Returns:
SVG -- SVG of the model summary.
```

*method* Model.plot_confusion_matrix

```
def plot_confusion_matrix(
self,
data: Iterable,
threshold: float = 0.5,
labels: Iterable = None,
title: str = 'Confusion matrix',
color_map='feyn-primary',
ax=None
) -> None
```

```
Compute and plot a Confusion Matrix.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Boundary of True and False predictions, default 0.5
labels -- List of labels to index the matrix
title -- Title of the plot.
color_map -- Color map from matplotlib to use for the matrix
ax -- matplotlib axes object to draw to, default None
```

*method* Model.plot_flow

```
def plot_flow(
self,
data: Iterable,
sample: Iterable
) -> 'SVG'
```

```
Plots the flow of activations through the model, for the provided sample. Uses the provided data as background information for visualization.
Returns:
SVG -- SVG of the model activation flow.
```

*method* Model.plot_partial

```
def plot_partial(
self,
data: Iterable,
by: str,
fixed: Union[dict, NoneType] = None
) -> None
```

```
Plot a partial dependence plot.
This plot is useful to interpret the effect of a specific feature on the model output.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots._partial_dependence.plot_partial(best, data, by="age")
You can use any column in the dataset as the `by` parameter.
If you use a numerical column, the feature will vary from min to max of that varialbe in the training set.
If you use a categorical column, the feature will display all categories, sorted by the average prediction of that category.
Arguments:
model -- The model to plot.
data -- The dataset to measure the loss on.
by -- The column in the dataset to interpret by.
fixed -- A dictionary of features and associated values to hold fixed
```

*method* Model.plot_partial2d

```
def plot_partial2d(
self,
data: 'DataFrame',
fixed: Dict[str, Union[int, float]] = None,
ax: Union[ForwardRef('Axes'), NoneType] = None,
resolution: int = 1000
) -> None
```

```
Visualize the response of a model to numerical inputs using a partial plot. The partial plot comes in two parts:
1. A colored background indicating the response of the model in a 2D space given the fixed values. A lighter color corresponds to a bigger output from the model.
2. Scatter-plotted data on top of the background. In a classification scenario, red corresponds to true positives, and blue corresponds to true negatives. For regression, the color gradient shows the true distribution of the output value. Two sizes are used in the scatterplot, the larger dots correspond to the data that matches the values in fixed and the smaller ones have data different from the values in fixed.
Arguments:
data {DataFrame} -- The data that will be scattered in the model.
Keyword Arguments:
fixed {Dict[str, Union[int, float]]} -- Dictionary with values we fix in the model. The key is a feature name in the model and the value is a number that the feature is fixed to. (default: {{}})
ax {Optional[plt.Axes.axes]} -- Optional matplotlib axes in which to make the partial plot. (default: {None})
resolution {int} -- The resolution at which we sample the 2D feature space for the background. (default: {1000})
Raises:
ValueError: Raised if the model features names minus the fixed value names are more than two, meaning that you need to fix more values to reduce the dimensionality and make a 2D plot possible.
ValueError: Raised if one of the features you are trying to plot in a 2D space is a categorical.
```

*method* Model.plot_probability_scores

```
def plot_probability_scores(
self,
data: Iterable,
title='',
nbins=10,
h_args=None,
ax=None
)
```

```
Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
title {str} -- plot title (default: {''})
nbins {int} -- number of bins (default: {10})
h_args {dict} -- histogram kwargs (default: {None})
ax {matplotlib.axes._subplots.AxesSubplot} -- axes object (default: {None})
```

*method* Model.plot_regression

```
def plot_regression(
self,
data: Iterable,
title: str = 'Actuals vs Prediction',
ax=None
)
```

```
This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x
Arguments:
data {typing.Iterable} -- The dataset to determine regression quality. It contains input names and output name of the model as columns
Keyword Arguments:
title {str} -- (default: {"Actuals vs Predictions"})
ax {AxesSubplot} -- (default: {None})
```

*method* Model.plot_residuals

```
def plot_residuals(
self,
data: Iterable,
title: str = 'Residuals plot',
ax=None
)
```

```
This plots the predicted values against the residuals (y_true - y_pred).
Arguments:
y_true {typing.Iterable} -- True values
y_pred {typing.Iterable} -- Predicted values
Keyword Arguments:
title {str} -- (default: {"Residual plot"})
ax {[type]} -- (default: {None})
```

*method* Model.plot_roc_curve

```
def plot_roc_curve(
self,
data: Iterable,
threshold: float = None,
title: str = 'ROC curve',
ax=None,
**kwargs
) -> None
```

```
Plot the model's ROC curve.
This is a shorthand for calling feyn.plots.plot_roc_curve.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
title -- Title of the plot.
ax -- matplotlib axes object to draw to, default None
**kwargs -- additional options to pass on to matplotlib
```

*method* Model.plot_segmented_loss

```
def plot_segmented_loss(
self,
data: Iterable,
by: Union[str, NoneType] = None,
loss_function='squared_error',
title='Segmented Loss',
ax=None
) -> None
```

```
Plot the loss by segment of a dataset.
This plot is useful to evaluate how a model performs on different subsets of the data.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")
This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.
You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.
Arguments:
data -- The dataset to measure the loss on.
by -- The column in the dataset to segment by.
loss_function -- The loss function to compute for each segmnent,
title -- Title of the plot.
ax -- matplotlib axes object to draw to
```

*method* Model.predict

```
def predict(
self,
X
) -> numpy.ndarray
```

```
Calculate predictions based on input values.
>>> model.predict({ "age": [34, 78], "sex": ["male", "female"] })
[True, False]
Arguments:
X -- The input values. Can be either a dict mapping input feature names to value arrays, or a pandas.DataFrame.
Returns:
np.ndarray -- The calculated predictions.
```

*method* Model.r2_score

```
def r2_score(
self,
data: Iterable
)
```

```
Compute the model's r2 score on a data set
The r2 score for a regression model is defined as
1 - rss/tss
Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.
A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value
It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
r2 score for the predictions
```

*method* Model.rmse

```
def rmse(
self,
data
)
```

```
Compute the model's root mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
RMSE for the predictions
```

*method* Model.roc_auc_score

```
def roc_auc_score(
self,
data: Iterable
)
```

```
Calculate the Area Under Curve (AUC) of the ROC curve.
A ROC curve depicts the ability of a binary classifier with varying threshold.
The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
AUC score for the predictions
```

*method* Model.save

```
def save(
self,
file: Union[~AnyStr, pathlib.Path, TextIO]
) -> None
```

```
Save the `Model` to a file-like object.
The file can later be used to recreate the `Model` with `Model.load`.
Arguments:
file -- A file-like object or path to save the model to.
```

*method* Model.squared_error

```
def squared_error(
self,
data: Iterable
)
```

```
Compute the model's squared error loss on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* Model.sympify

```
def sympify(
self,
signif: int = 6,
symbolic_lr=False,
include_weights=True
)
```

```
Convert the model to a sympy expression.
This function requires sympy to be installed.
Arguments:
signif -- the number of significant digits in the parameters of the model
symbolic_lr -- express logistic regression wrapper as part of the expression
Returns:
expression -- a sympy expression
```