# feyn.reference

```
This module contains reference models that can be used for comparison with feyn models.
```

*class* ConstantModel

```
def __init__(
output_name,
const
) -> ConstantModel
```

```
```

*method* ConstantModel.absolute_error

```
def absolute_error(
self,
data: Iterable
)
```

```
Compute the model's absolute error on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* ConstantModel.accuracy_score

```
def accuracy_score(
self,
data: Iterable
)
```

```
Compute the model's accuracy score on a data set.
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as
(number of correct predictions) / (total number of preditions)
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
accuracy score for the predictions
```

*method* ConstantModel.accuracy_threshold

```
def accuracy_threshold(
self,
data: Iterable
)
```

```
Compute the accuracy score of predictions with optimal threshold
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.
This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.
Arguments:
true -- Expected values
pred -- Predicted values
Returns a tuple with:
threshold that maximizes accuracy
accuracy score obtained with this threshold
```

*method* ConstantModel.binary_cross_entropy

```
def binary_cross_entropy(
self,
data: Iterable
)
```

```
Compute the model's binary cross entropy on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* ConstantModel.mae

```
def mae(
self,
data
)
```

```
Compute the model's mean absolute error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MAE for the predictions
```

*method* ConstantModel.mse

```
def mse(
self,
data
)
```

```
Compute the model's mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MSE for the predictions
```

*method* ConstantModel.plot_confusion_matrix

```
def plot_confusion_matrix(
self,
data: Iterable,
threshold: float = 0.5,
labels: Iterable = None,
title: str = 'Confusion matrix',
color_map='feyn-primary',
ax=None
) -> None
```

```
Compute and plot a Confusion Matrix.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Boundary of True and False predictions, default 0.5
labels -- List of labels to index the matrix
title -- Title of the plot.
color_map -- Color map from matplotlib to use for the matrix
ax -- matplotlib axes object to draw to, default None
```

*method* ConstantModel.plot_partial

```
def plot_partial(
self,
data: Iterable,
by: str,
fixed: Union[dict, NoneType] = None
) -> None
```

```
Plot a partial dependence plot.
This plot is useful to interpret the effect of a specific feature on the model output.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots._partial_dependence.plot_partial(best, data, by="age")
You can use any column in the dataset as the `by` parameter.
If you use a numerical column, the feature will vary from min to max of that varialbe in the training set.
If you use a categorical column, the feature will display all categories, sorted by the average prediction of that category.
Arguments:
model -- The model to plot.
data -- The dataset to measure the loss on.
by -- The column in the dataset to interpret by.
fixed -- A dictionary of features and associated values to hold fixed
```

*method* ConstantModel.plot_probability_scores

```
def plot_probability_scores(
self,
data: Iterable,
title='',
nbins=10,
h_args=None,
ax=None
)
```

```
Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
title {str} -- plot title (default: {''})
nbins {int} -- number of bins (default: {10})
h_args {dict} -- histogram kwargs (default: {None})
ax {matplotlib.axes._subplots.AxesSubplot} -- axes object (default: {None})
```

*method* ConstantModel.plot_regression

```
def plot_regression(
self,
data: Iterable,
title: str = 'Actuals vs Prediction',
ax=None
)
```

```
This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x
Arguments:
data {typing.Iterable} -- The dataset to determine regression quality. It contains input names and output name of the model as columns
Keyword Arguments:
title {str} -- (default: {"Actuals vs Predictions"})
ax {AxesSubplot} -- (default: {None})
```

*method* ConstantModel.plot_residuals

```
def plot_residuals(
self,
data: Iterable,
title: str = 'Residuals plot',
ax=None
)
```

```
This plots the predicted values against the residuals (y_true - y_pred).
Arguments:
y_true {typing.Iterable} -- True values
y_pred {typing.Iterable} -- Predicted values
Keyword Arguments:
title {str} -- (default: {"Residual plot"})
ax {[type]} -- (default: {None})
```

*method* ConstantModel.plot_roc_curve

```
def plot_roc_curve(
self,
data: Iterable,
threshold: float = None,
title: str = 'ROC curve',
ax=None,
**kwargs
) -> None
```

```
Plot the model's ROC curve.
This is a shorthand for calling feyn.plots.plot_roc_curve.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
title -- Title of the plot.
ax -- matplotlib axes object to draw to, default None
**kwargs -- additional options to pass on to matplotlib
```

*method* ConstantModel.plot_segmented_loss

```
def plot_segmented_loss(
self,
data: Iterable,
by: Union[str, NoneType] = None,
loss_function='squared_error',
title='Segmented Loss',
ax=None
) -> None
```

```
Plot the loss by segment of a dataset.
This plot is useful to evaluate how a model performs on different subsets of the data.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")
This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.
You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.
Arguments:
data -- The dataset to measure the loss on.
by -- The column in the dataset to segment by.
loss_function -- The loss function to compute for each segmnent,
title -- Title of the plot.
ax -- matplotlib axes object to draw to
```

*method* ConstantModel.predict

```
def predict(
self,
data: Iterable
)
```

```
```

*method* ConstantModel.r2_score

```
def r2_score(
self,
data: Iterable
)
```

```
Compute the model's r2 score on a data set
The r2 score for a regression model is defined as
1 - rss/tss
Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.
A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value
It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
r2 score for the predictions
```

*method* ConstantModel.rmse

```
def rmse(
self,
data
)
```

```
Compute the model's root mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
RMSE for the predictions
```

*method* ConstantModel.roc_auc_score

```
def roc_auc_score(
self,
data: Iterable
)
```

```
Calculate the Area Under Curve (AUC) of the ROC curve.
A ROC curve depicts the ability of a binary classifier with varying threshold.
The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
AUC score for the predictions
```

*method* ConstantModel.squared_error

```
def squared_error(
self,
data: Iterable
)
```

```
Compute the model's squared error loss on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*class* GradientBoostingClassifier

```
def __init__(
data,
output_name,
**kwargs
) -> GradientBoostingClassifier
```

```
```

*method* GradientBoostingClassifier.absolute_error

```
def absolute_error(
self,
data: Iterable
)
```

```
Compute the model's absolute error on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* GradientBoostingClassifier.accuracy_score

```
def accuracy_score(
self,
data: Iterable
)
```

```
Compute the model's accuracy score on a data set.
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as
(number of correct predictions) / (total number of preditions)
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
accuracy score for the predictions
```

*method* GradientBoostingClassifier.accuracy_threshold

```
def accuracy_threshold(
self,
data: Iterable
)
```

```
Compute the accuracy score of predictions with optimal threshold
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.
This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.
Arguments:
true -- Expected values
pred -- Predicted values
Returns a tuple with:
threshold that maximizes accuracy
accuracy score obtained with this threshold
```

*method* GradientBoostingClassifier.binary_cross_entropy

```
def binary_cross_entropy(
self,
data: Iterable
)
```

```
Compute the model's binary cross entropy on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* GradientBoostingClassifier.mae

```
def mae(
self,
data
)
```

```
Compute the model's mean absolute error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MAE for the predictions
```

*method* GradientBoostingClassifier.mse

```
def mse(
self,
data
)
```

```
Compute the model's mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MSE for the predictions
```

*method* GradientBoostingClassifier.plot_confusion_matrix

```
def plot_confusion_matrix(
self,
data: Iterable,
threshold: float = 0.5,
labels: Iterable = None,
title: str = 'Confusion matrix',
color_map='feyn-primary',
ax=None
) -> None
```

```
Compute and plot a Confusion Matrix.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Boundary of True and False predictions, default 0.5
labels -- List of labels to index the matrix
title -- Title of the plot.
color_map -- Color map from matplotlib to use for the matrix
ax -- matplotlib axes object to draw to, default None
```

*method* GradientBoostingClassifier.plot_partial

```
def plot_partial(
self,
data: Iterable,
by: str,
fixed: Union[dict, NoneType] = None
) -> None
```

```
Plot a partial dependence plot.
This plot is useful to interpret the effect of a specific feature on the model output.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots._partial_dependence.plot_partial(best, data, by="age")
You can use any column in the dataset as the `by` parameter.
If you use a numerical column, the feature will vary from min to max of that varialbe in the training set.
If you use a categorical column, the feature will display all categories, sorted by the average prediction of that category.
Arguments:
model -- The model to plot.
data -- The dataset to measure the loss on.
by -- The column in the dataset to interpret by.
fixed -- A dictionary of features and associated values to hold fixed
```

*method* GradientBoostingClassifier.plot_probability_scores

```
def plot_probability_scores(
self,
data: Iterable,
title='',
nbins=10,
h_args=None,
ax=None
)
```

```
Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
title {str} -- plot title (default: {''})
nbins {int} -- number of bins (default: {10})
h_args {dict} -- histogram kwargs (default: {None})
ax {matplotlib.axes._subplots.AxesSubplot} -- axes object (default: {None})
```

*method* GradientBoostingClassifier.plot_regression

```
def plot_regression(
self,
data: Iterable,
title: str = 'Actuals vs Prediction',
ax=None
)
```

```
This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x
Arguments:
data {typing.Iterable} -- The dataset to determine regression quality. It contains input names and output name of the model as columns
Keyword Arguments:
title {str} -- (default: {"Actuals vs Predictions"})
ax {AxesSubplot} -- (default: {None})
```

*method* GradientBoostingClassifier.plot_residuals

```
def plot_residuals(
self,
data: Iterable,
title: str = 'Residuals plot',
ax=None
)
```

```
This plots the predicted values against the residuals (y_true - y_pred).
Arguments:
y_true {typing.Iterable} -- True values
y_pred {typing.Iterable} -- Predicted values
Keyword Arguments:
title {str} -- (default: {"Residual plot"})
ax {[type]} -- (default: {None})
```

*method* GradientBoostingClassifier.plot_roc_curve

```
def plot_roc_curve(
self,
data: Iterable,
threshold: float = None,
title: str = 'ROC curve',
ax=None,
**kwargs
) -> None
```

```
Plot the model's ROC curve.
This is a shorthand for calling feyn.plots.plot_roc_curve.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
title -- Title of the plot.
ax -- matplotlib axes object to draw to, default None
**kwargs -- additional options to pass on to matplotlib
```

*method* GradientBoostingClassifier.plot_segmented_loss

```
def plot_segmented_loss(
self,
data: Iterable,
by: Union[str, NoneType] = None,
loss_function='squared_error',
title='Segmented Loss',
ax=None
) -> None
```

```
Plot the loss by segment of a dataset.
This plot is useful to evaluate how a model performs on different subsets of the data.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")
This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.
You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.
Arguments:
data -- The dataset to measure the loss on.
by -- The column in the dataset to segment by.
loss_function -- The loss function to compute for each segmnent,
title -- Title of the plot.
ax -- matplotlib axes object to draw to
```

*method* GradientBoostingClassifier.predict

```
def predict(
self,
X: Iterable
)
```

```
```

*method* GradientBoostingClassifier.r2_score

```
def r2_score(
self,
data: Iterable
)
```

```
Compute the model's r2 score on a data set
The r2 score for a regression model is defined as
1 - rss/tss
Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.
A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value
It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
r2 score for the predictions
```

*method* GradientBoostingClassifier.rmse

```
def rmse(
self,
data
)
```

```
Compute the model's root mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
RMSE for the predictions
```

*method* GradientBoostingClassifier.roc_auc_score

```
def roc_auc_score(
self,
data: Iterable
)
```

```
Calculate the Area Under Curve (AUC) of the ROC curve.
A ROC curve depicts the ability of a binary classifier with varying threshold.
The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
AUC score for the predictions
```

*method* GradientBoostingClassifier.squared_error

```
def squared_error(
self,
data: Iterable
)
```

```
Compute the model's squared error loss on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*class* LinearRegression

```
def __init__(
data,
output_name,
**kwargs
) -> LinearRegression
```

```
```

*method* LinearRegression.absolute_error

```
def absolute_error(
self,
data: Iterable
)
```

```
Compute the model's absolute error on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* LinearRegression.accuracy_score

```
def accuracy_score(
self,
data: Iterable
)
```

```
Compute the model's accuracy score on a data set.
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as
(number of correct predictions) / (total number of preditions)
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
accuracy score for the predictions
```

*method* LinearRegression.accuracy_threshold

```
def accuracy_threshold(
self,
data: Iterable
)
```

```
Compute the accuracy score of predictions with optimal threshold
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.
This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.
Arguments:
true -- Expected values
pred -- Predicted values
Returns a tuple with:
threshold that maximizes accuracy
accuracy score obtained with this threshold
```

*method* LinearRegression.binary_cross_entropy

```
def binary_cross_entropy(
self,
data: Iterable
)
```

```
Compute the model's binary cross entropy on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* LinearRegression.mae

```
def mae(
self,
data
)
```

```
Compute the model's mean absolute error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MAE for the predictions
```

*method* LinearRegression.mse

```
def mse(
self,
data
)
```

```
Compute the model's mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MSE for the predictions
```

*method* LinearRegression.plot_confusion_matrix

```
def plot_confusion_matrix(
self,
data: Iterable,
threshold: float = 0.5,
labels: Iterable = None,
title: str = 'Confusion matrix',
color_map='feyn-primary',
ax=None
) -> None
```

```
Compute and plot a Confusion Matrix.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Boundary of True and False predictions, default 0.5
labels -- List of labels to index the matrix
title -- Title of the plot.
color_map -- Color map from matplotlib to use for the matrix
ax -- matplotlib axes object to draw to, default None
```

*method* LinearRegression.plot_partial

```
def plot_partial(
self,
data: Iterable,
by: str,
fixed: Union[dict, NoneType] = None
) -> None
```

```
Plot a partial dependence plot.
This plot is useful to interpret the effect of a specific feature on the model output.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots._partial_dependence.plot_partial(best, data, by="age")
You can use any column in the dataset as the `by` parameter.
If you use a numerical column, the feature will vary from min to max of that varialbe in the training set.
If you use a categorical column, the feature will display all categories, sorted by the average prediction of that category.
Arguments:
model -- The model to plot.
data -- The dataset to measure the loss on.
by -- The column in the dataset to interpret by.
fixed -- A dictionary of features and associated values to hold fixed
```

*method* LinearRegression.plot_probability_scores

```
def plot_probability_scores(
self,
data: Iterable,
title='',
nbins=10,
h_args=None,
ax=None
)
```

```
Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
title {str} -- plot title (default: {''})
nbins {int} -- number of bins (default: {10})
h_args {dict} -- histogram kwargs (default: {None})
ax {matplotlib.axes._subplots.AxesSubplot} -- axes object (default: {None})
```

*method* LinearRegression.plot_regression

```
def plot_regression(
self,
data: Iterable,
title: str = 'Actuals vs Prediction',
ax=None
)
```

```
This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x
Arguments:
data {typing.Iterable} -- The dataset to determine regression quality. It contains input names and output name of the model as columns
Keyword Arguments:
title {str} -- (default: {"Actuals vs Predictions"})
ax {AxesSubplot} -- (default: {None})
```

*method* LinearRegression.plot_residuals

```
def plot_residuals(
self,
data: Iterable,
title: str = 'Residuals plot',
ax=None
)
```

```
This plots the predicted values against the residuals (y_true - y_pred).
Arguments:
y_true {typing.Iterable} -- True values
y_pred {typing.Iterable} -- Predicted values
Keyword Arguments:
title {str} -- (default: {"Residual plot"})
ax {[type]} -- (default: {None})
```

*method* LinearRegression.plot_roc_curve

```
def plot_roc_curve(
self,
data: Iterable,
threshold: float = None,
title: str = 'ROC curve',
ax=None,
**kwargs
) -> None
```

```
Plot the model's ROC curve.
This is a shorthand for calling feyn.plots.plot_roc_curve.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
title -- Title of the plot.
ax -- matplotlib axes object to draw to, default None
**kwargs -- additional options to pass on to matplotlib
```

*method* LinearRegression.plot_segmented_loss

```
def plot_segmented_loss(
self,
data: Iterable,
by: Union[str, NoneType] = None,
loss_function='squared_error',
title='Segmented Loss',
ax=None
) -> None
```

```
Plot the loss by segment of a dataset.
This plot is useful to evaluate how a model performs on different subsets of the data.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")
This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.
You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.
Arguments:
data -- The dataset to measure the loss on.
by -- The column in the dataset to segment by.
loss_function -- The loss function to compute for each segmnent,
title -- Title of the plot.
ax -- matplotlib axes object to draw to
```

*method* LinearRegression.predict

```
def predict(
self,
X: Iterable
)
```

```
```

*method* LinearRegression.r2_score

```
def r2_score(
self,
data: Iterable
)
```

```
Compute the model's r2 score on a data set
The r2 score for a regression model is defined as
1 - rss/tss
Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.
A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value
It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
r2 score for the predictions
```

*method* LinearRegression.rmse

```
def rmse(
self,
data
)
```

```
Compute the model's root mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
RMSE for the predictions
```

*method* LinearRegression.roc_auc_score

```
def roc_auc_score(
self,
data: Iterable
)
```

```
Calculate the Area Under Curve (AUC) of the ROC curve.
A ROC curve depicts the ability of a binary classifier with varying threshold.
The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
AUC score for the predictions
```

*method* LinearRegression.squared_error

```
def squared_error(
self,
data: Iterable
)
```

```
Compute the model's squared error loss on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*class* LogisticRegressionClassifier

```
def __init__(
data,
output_name,
**kwargs
) -> LogisticRegressionClassifier
```

```
```

*method* LogisticRegressionClassifier.absolute_error

```
def absolute_error(
self,
data: Iterable
)
```

```
Compute the model's absolute error on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* LogisticRegressionClassifier.accuracy_score

```
def accuracy_score(
self,
data: Iterable
)
```

```
Compute the model's accuracy score on a data set.
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as
(number of correct predictions) / (total number of preditions)
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
accuracy score for the predictions
```

*method* LogisticRegressionClassifier.accuracy_threshold

```
def accuracy_threshold(
self,
data: Iterable
)
```

```
Compute the accuracy score of predictions with optimal threshold
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.
This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.
Arguments:
true -- Expected values
pred -- Predicted values
Returns a tuple with:
threshold that maximizes accuracy
accuracy score obtained with this threshold
```

*method* LogisticRegressionClassifier.binary_cross_entropy

```
def binary_cross_entropy(
self,
data: Iterable
)
```

```
Compute the model's binary cross entropy on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* LogisticRegressionClassifier.mae

```
def mae(
self,
data
)
```

```
Compute the model's mean absolute error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MAE for the predictions
```

*method* LogisticRegressionClassifier.mse

```
def mse(
self,
data
)
```

```
Compute the model's mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MSE for the predictions
```

*method* LogisticRegressionClassifier.plot_confusion_matrix

```
def plot_confusion_matrix(
self,
data: Iterable,
threshold: float = 0.5,
labels: Iterable = None,
title: str = 'Confusion matrix',
color_map='feyn-primary',
ax=None
) -> None
```

```
Compute and plot a Confusion Matrix.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Boundary of True and False predictions, default 0.5
labels -- List of labels to index the matrix
title -- Title of the plot.
color_map -- Color map from matplotlib to use for the matrix
ax -- matplotlib axes object to draw to, default None
```

*method* LogisticRegressionClassifier.plot_partial

```
def plot_partial(
self,
data: Iterable,
by: str,
fixed: Union[dict, NoneType] = None
) -> None
```

```
Plot a partial dependence plot.
This plot is useful to interpret the effect of a specific feature on the model output.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots._partial_dependence.plot_partial(best, data, by="age")
You can use any column in the dataset as the `by` parameter.
If you use a numerical column, the feature will vary from min to max of that varialbe in the training set.
If you use a categorical column, the feature will display all categories, sorted by the average prediction of that category.
Arguments:
model -- The model to plot.
data -- The dataset to measure the loss on.
by -- The column in the dataset to interpret by.
fixed -- A dictionary of features and associated values to hold fixed
```

*method* LogisticRegressionClassifier.plot_probability_scores

```
def plot_probability_scores(
self,
data: Iterable,
title='',
nbins=10,
h_args=None,
ax=None
)
```

```
Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
title {str} -- plot title (default: {''})
nbins {int} -- number of bins (default: {10})
h_args {dict} -- histogram kwargs (default: {None})
ax {matplotlib.axes._subplots.AxesSubplot} -- axes object (default: {None})
```

*method* LogisticRegressionClassifier.plot_regression

```
def plot_regression(
self,
data: Iterable,
title: str = 'Actuals vs Prediction',
ax=None
)
```

```
This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x
Arguments:
data {typing.Iterable} -- The dataset to determine regression quality. It contains input names and output name of the model as columns
Keyword Arguments:
title {str} -- (default: {"Actuals vs Predictions"})
ax {AxesSubplot} -- (default: {None})
```

*method* LogisticRegressionClassifier.plot_residuals

```
def plot_residuals(
self,
data: Iterable,
title: str = 'Residuals plot',
ax=None
)
```

```
This plots the predicted values against the residuals (y_true - y_pred).
Arguments:
y_true {typing.Iterable} -- True values
y_pred {typing.Iterable} -- Predicted values
Keyword Arguments:
title {str} -- (default: {"Residual plot"})
ax {[type]} -- (default: {None})
```

*method* LogisticRegressionClassifier.plot_roc_curve

```
def plot_roc_curve(
self,
data: Iterable,
threshold: float = None,
title: str = 'ROC curve',
ax=None,
**kwargs
) -> None
```

```
Plot the model's ROC curve.
This is a shorthand for calling feyn.plots.plot_roc_curve.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
title -- Title of the plot.
ax -- matplotlib axes object to draw to, default None
**kwargs -- additional options to pass on to matplotlib
```

*method* LogisticRegressionClassifier.plot_segmented_loss

```
def plot_segmented_loss(
self,
data: Iterable,
by: Union[str, NoneType] = None,
loss_function='squared_error',
title='Segmented Loss',
ax=None
) -> None
```

```
Plot the loss by segment of a dataset.
This plot is useful to evaluate how a model performs on different subsets of the data.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")
This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.
You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.
Arguments:
data -- The dataset to measure the loss on.
by -- The column in the dataset to segment by.
loss_function -- The loss function to compute for each segmnent,
title -- Title of the plot.
ax -- matplotlib axes object to draw to
```

*method* LogisticRegressionClassifier.predict

```
def predict(
self,
X: Iterable
)
```

```
```

*method* LogisticRegressionClassifier.r2_score

```
def r2_score(
self,
data: Iterable
)
```

```
Compute the model's r2 score on a data set
The r2 score for a regression model is defined as
1 - rss/tss
Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.
A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value
It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
r2 score for the predictions
```

*method* LogisticRegressionClassifier.rmse

```
def rmse(
self,
data
)
```

```
Compute the model's root mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
RMSE for the predictions
```

*method* LogisticRegressionClassifier.roc_auc_score

```
def roc_auc_score(
self,
data: Iterable
)
```

```
Calculate the Area Under Curve (AUC) of the ROC curve.
A ROC curve depicts the ability of a binary classifier with varying threshold.
The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
AUC score for the predictions
```

*method* LogisticRegressionClassifier.squared_error

```
def squared_error(
self,
data: Iterable
)
```

```
Compute the model's squared error loss on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* LogisticRegressionClassifier.summary

```
def summary(
self,
ax=None
)
```

```
```

*class* RandomForestClassifier

```
def __init__(
data,
output_name,
**kwargs
) -> RandomForestClassifier
```

```
```

*method* RandomForestClassifier.absolute_error

```
def absolute_error(
self,
data: Iterable
)
```

```
Compute the model's absolute error on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* RandomForestClassifier.accuracy_score

```
def accuracy_score(
self,
data: Iterable
)
```

```
Compute the model's accuracy score on a data set.
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as
(number of correct predictions) / (total number of preditions)
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
accuracy score for the predictions
```

*method* RandomForestClassifier.accuracy_threshold

```
def accuracy_threshold(
self,
data: Iterable
)
```

```
Compute the accuracy score of predictions with optimal threshold
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.
This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.
Arguments:
true -- Expected values
pred -- Predicted values
Returns a tuple with:
threshold that maximizes accuracy
accuracy score obtained with this threshold
```

*method* RandomForestClassifier.binary_cross_entropy

```
def binary_cross_entropy(
self,
data: Iterable
)
```

```
Compute the model's binary cross entropy on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* RandomForestClassifier.mae

```
def mae(
self,
data
)
```

```
Compute the model's mean absolute error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MAE for the predictions
```

*method* RandomForestClassifier.mse

```
def mse(
self,
data
)
```

```
Compute the model's mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MSE for the predictions
```

*method* RandomForestClassifier.plot_confusion_matrix

```
def plot_confusion_matrix(
self,
data: Iterable,
threshold: float = 0.5,
labels: Iterable = None,
title: str = 'Confusion matrix',
color_map='feyn-primary',
ax=None
) -> None
```

```
Compute and plot a Confusion Matrix.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Boundary of True and False predictions, default 0.5
labels -- List of labels to index the matrix
title -- Title of the plot.
color_map -- Color map from matplotlib to use for the matrix
ax -- matplotlib axes object to draw to, default None
```

*method* RandomForestClassifier.plot_partial

```
def plot_partial(
self,
data: Iterable,
by: str,
fixed: Union[dict, NoneType] = None
) -> None
```

```
Plot a partial dependence plot.
This plot is useful to interpret the effect of a specific feature on the model output.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots._partial_dependence.plot_partial(best, data, by="age")
You can use any column in the dataset as the `by` parameter.
If you use a numerical column, the feature will vary from min to max of that varialbe in the training set.
If you use a categorical column, the feature will display all categories, sorted by the average prediction of that category.
Arguments:
model -- The model to plot.
data -- The dataset to measure the loss on.
by -- The column in the dataset to interpret by.
fixed -- A dictionary of features and associated values to hold fixed
```

*method* RandomForestClassifier.plot_probability_scores

```
def plot_probability_scores(
self,
data: Iterable,
title='',
nbins=10,
h_args=None,
ax=None
)
```

```
Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
title {str} -- plot title (default: {''})
nbins {int} -- number of bins (default: {10})
h_args {dict} -- histogram kwargs (default: {None})
ax {matplotlib.axes._subplots.AxesSubplot} -- axes object (default: {None})
```

*method* RandomForestClassifier.plot_regression

```
def plot_regression(
self,
data: Iterable,
title: str = 'Actuals vs Prediction',
ax=None
)
```

```
This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x
Arguments:
data {typing.Iterable} -- The dataset to determine regression quality. It contains input names and output name of the model as columns
Keyword Arguments:
title {str} -- (default: {"Actuals vs Predictions"})
ax {AxesSubplot} -- (default: {None})
```

*method* RandomForestClassifier.plot_residuals

```
def plot_residuals(
self,
data: Iterable,
title: str = 'Residuals plot',
ax=None
)
```

```
This plots the predicted values against the residuals (y_true - y_pred).
Arguments:
y_true {typing.Iterable} -- True values
y_pred {typing.Iterable} -- Predicted values
Keyword Arguments:
title {str} -- (default: {"Residual plot"})
ax {[type]} -- (default: {None})
```

*method* RandomForestClassifier.plot_roc_curve

```
def plot_roc_curve(
self,
data: Iterable,
threshold: float = None,
title: str = 'ROC curve',
ax=None,
**kwargs
) -> None
```

```
Plot the model's ROC curve.
This is a shorthand for calling feyn.plots.plot_roc_curve.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
title -- Title of the plot.
ax -- matplotlib axes object to draw to, default None
**kwargs -- additional options to pass on to matplotlib
```

*method* RandomForestClassifier.plot_segmented_loss

```
def plot_segmented_loss(
self,
data: Iterable,
by: Union[str, NoneType] = None,
loss_function='squared_error',
title='Segmented Loss',
ax=None
) -> None
```

```
Plot the loss by segment of a dataset.
This plot is useful to evaluate how a model performs on different subsets of the data.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")
This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.
You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.
Arguments:
data -- The dataset to measure the loss on.
by -- The column in the dataset to segment by.
loss_function -- The loss function to compute for each segmnent,
title -- Title of the plot.
ax -- matplotlib axes object to draw to
```

*method* RandomForestClassifier.predict

```
def predict(
self,
X: Iterable
)
```

```
```

*method* RandomForestClassifier.r2_score

```
def r2_score(
self,
data: Iterable
)
```

```
Compute the model's r2 score on a data set
The r2 score for a regression model is defined as
1 - rss/tss
Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.
A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value
It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
r2 score for the predictions
```

*method* RandomForestClassifier.rmse

```
def rmse(
self,
data
)
```

```
Compute the model's root mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
RMSE for the predictions
```

*method* RandomForestClassifier.roc_auc_score

```
def roc_auc_score(
self,
data: Iterable
)
```

```
Calculate the Area Under Curve (AUC) of the ROC curve.
A ROC curve depicts the ability of a binary classifier with varying threshold.
The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
AUC score for the predictions
```

*method* RandomForestClassifier.squared_error

```
def squared_error(
self,
data: Iterable
)
```

```
Compute the model's squared error loss on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*class* SKLeanClassifier

```
def __init__(
sklearn_classifier: type,
data,
output_name,
**kwargs
) -> SKLeanClassifier
```

```
```

*method* SKLeanClassifier.absolute_error

```
def absolute_error(
self,
data: Iterable
)
```

```
Compute the model's absolute error on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* SKLeanClassifier.accuracy_score

```
def accuracy_score(
self,
data: Iterable
)
```

```
Compute the model's accuracy score on a data set.
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as
(number of correct predictions) / (total number of preditions)
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
accuracy score for the predictions
```

*method* SKLeanClassifier.accuracy_threshold

```
def accuracy_threshold(
self,
data: Iterable
)
```

```
Compute the accuracy score of predictions with optimal threshold
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.
This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.
Arguments:
true -- Expected values
pred -- Predicted values
Returns a tuple with:
threshold that maximizes accuracy
accuracy score obtained with this threshold
```

*method* SKLeanClassifier.binary_cross_entropy

```
def binary_cross_entropy(
self,
data: Iterable
)
```

```
Compute the model's binary cross entropy on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* SKLeanClassifier.mae

```
def mae(
self,
data
)
```

```
Compute the model's mean absolute error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MAE for the predictions
```

*method* SKLeanClassifier.mse

```
def mse(
self,
data
)
```

```
Compute the model's mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MSE for the predictions
```

*method* SKLeanClassifier.plot_confusion_matrix

```
def plot_confusion_matrix(
self,
data: Iterable,
threshold: float = 0.5,
labels: Iterable = None,
title: str = 'Confusion matrix',
color_map='feyn-primary',
ax=None
) -> None
```

```
Compute and plot a Confusion Matrix.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Boundary of True and False predictions, default 0.5
labels -- List of labels to index the matrix
title -- Title of the plot.
color_map -- Color map from matplotlib to use for the matrix
ax -- matplotlib axes object to draw to, default None
```

*method* SKLeanClassifier.plot_partial

```
def plot_partial(
self,
data: Iterable,
by: str,
fixed: Union[dict, NoneType] = None
) -> None
```

```
Plot a partial dependence plot.
This plot is useful to interpret the effect of a specific feature on the model output.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots._partial_dependence.plot_partial(best, data, by="age")
You can use any column in the dataset as the `by` parameter.
If you use a numerical column, the feature will vary from min to max of that varialbe in the training set.
If you use a categorical column, the feature will display all categories, sorted by the average prediction of that category.
Arguments:
model -- The model to plot.
data -- The dataset to measure the loss on.
by -- The column in the dataset to interpret by.
fixed -- A dictionary of features and associated values to hold fixed
```

*method* SKLeanClassifier.plot_probability_scores

```
def plot_probability_scores(
self,
data: Iterable,
title='',
nbins=10,
h_args=None,
ax=None
)
```

```
Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
title {str} -- plot title (default: {''})
nbins {int} -- number of bins (default: {10})
h_args {dict} -- histogram kwargs (default: {None})
ax {matplotlib.axes._subplots.AxesSubplot} -- axes object (default: {None})
```

*method* SKLeanClassifier.plot_regression

```
def plot_regression(
self,
data: Iterable,
title: str = 'Actuals vs Prediction',
ax=None
)
```

```
This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x
Arguments:
data {typing.Iterable} -- The dataset to determine regression quality. It contains input names and output name of the model as columns
Keyword Arguments:
title {str} -- (default: {"Actuals vs Predictions"})
ax {AxesSubplot} -- (default: {None})
```

*method* SKLeanClassifier.plot_residuals

```
def plot_residuals(
self,
data: Iterable,
title: str = 'Residuals plot',
ax=None
)
```

```
This plots the predicted values against the residuals (y_true - y_pred).
Arguments:
y_true {typing.Iterable} -- True values
y_pred {typing.Iterable} -- Predicted values
Keyword Arguments:
title {str} -- (default: {"Residual plot"})
ax {[type]} -- (default: {None})
```

*method* SKLeanClassifier.plot_roc_curve

```
def plot_roc_curve(
self,
data: Iterable,
threshold: float = None,
title: str = 'ROC curve',
ax=None,
**kwargs
) -> None
```

```
Plot the model's ROC curve.
This is a shorthand for calling feyn.plots.plot_roc_curve.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
title -- Title of the plot.
ax -- matplotlib axes object to draw to, default None
**kwargs -- additional options to pass on to matplotlib
```

*method* SKLeanClassifier.plot_segmented_loss

```
def plot_segmented_loss(
self,
data: Iterable,
by: Union[str, NoneType] = None,
loss_function='squared_error',
title='Segmented Loss',
ax=None
) -> None
```

```
Plot the loss by segment of a dataset.
This plot is useful to evaluate how a model performs on different subsets of the data.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")
This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.
You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.
Arguments:
data -- The dataset to measure the loss on.
by -- The column in the dataset to segment by.
loss_function -- The loss function to compute for each segmnent,
title -- Title of the plot.
ax -- matplotlib axes object to draw to
```

*method* SKLeanClassifier.predict

```
def predict(
self,
X: Iterable
)
```

```
```

*method* SKLeanClassifier.r2_score

```
def r2_score(
self,
data: Iterable
)
```

```
Compute the model's r2 score on a data set
The r2 score for a regression model is defined as
1 - rss/tss
Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.
A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value
It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
r2 score for the predictions
```

*method* SKLeanClassifier.rmse

```
def rmse(
self,
data
)
```

```
Compute the model's root mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
RMSE for the predictions
```

*method* SKLeanClassifier.roc_auc_score

```
def roc_auc_score(
self,
data: Iterable
)
```

```
Calculate the Area Under Curve (AUC) of the ROC curve.
A ROC curve depicts the ability of a binary classifier with varying threshold.
The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
AUC score for the predictions
```

*method* SKLeanClassifier.squared_error

```
def squared_error(
self,
data: Iterable
)
```

```
Compute the model's squared error loss on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*class* SKLearnRegressor

```
def __init__(
sklearn_regressor: type,
data,
output_name,
**kwargs
) -> SKLearnRegressor
```

```
```

*method* SKLearnRegressor.absolute_error

```
def absolute_error(
self,
data: Iterable
)
```

```
Compute the model's absolute error on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* SKLearnRegressor.accuracy_score

```
def accuracy_score(
self,
data: Iterable
)
```

```
Compute the model's accuracy score on a data set.
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as
(number of correct predictions) / (total number of preditions)
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
accuracy score for the predictions
```

*method* SKLearnRegressor.accuracy_threshold

```
def accuracy_threshold(
self,
data: Iterable
)
```

```
Compute the accuracy score of predictions with optimal threshold
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.
This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.
Arguments:
true -- Expected values
pred -- Predicted values
Returns a tuple with:
threshold that maximizes accuracy
accuracy score obtained with this threshold
```

*method* SKLearnRegressor.binary_cross_entropy

```
def binary_cross_entropy(
self,
data: Iterable
)
```

```
Compute the model's binary cross entropy on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```

*method* SKLearnRegressor.mae

```
def mae(
self,
data
)
```

```
Compute the model's mean absolute error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MAE for the predictions
```

*method* SKLearnRegressor.mse

```
def mse(
self,
data
)
```

```
Compute the model's mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MSE for the predictions
```

*method* SKLearnRegressor.plot_confusion_matrix

```
def plot_confusion_matrix(
self,
data: Iterable,
threshold: float = 0.5,
labels: Iterable = None,
title: str = 'Confusion matrix',
color_map='feyn-primary',
ax=None
) -> None
```

```
Compute and plot a Confusion Matrix.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Boundary of True and False predictions, default 0.5
labels -- List of labels to index the matrix
title -- Title of the plot.
color_map -- Color map from matplotlib to use for the matrix
ax -- matplotlib axes object to draw to, default None
```

*method* SKLearnRegressor.plot_partial

```
def plot_partial(
self,
data: Iterable,
by: str,
fixed: Union[dict, NoneType] = None
) -> None
```

```
Plot a partial dependence plot.
This plot is useful to interpret the effect of a specific feature on the model output.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots._partial_dependence.plot_partial(best, data, by="age")
You can use any column in the dataset as the `by` parameter.
If you use a numerical column, the feature will vary from min to max of that varialbe in the training set.
If you use a categorical column, the feature will display all categories, sorted by the average prediction of that category.
Arguments:
model -- The model to plot.
data -- The dataset to measure the loss on.
by -- The column in the dataset to interpret by.
fixed -- A dictionary of features and associated values to hold fixed
```

*method* SKLearnRegressor.plot_probability_scores

```
def plot_probability_scores(
self,
data: Iterable,
title='',
nbins=10,
h_args=None,
ax=None
)
```

```
Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
title {str} -- plot title (default: {''})
nbins {int} -- number of bins (default: {10})
h_args {dict} -- histogram kwargs (default: {None})
ax {matplotlib.axes._subplots.AxesSubplot} -- axes object (default: {None})
```

*method* SKLearnRegressor.plot_regression

```
def plot_regression(
self,
data: Iterable,
title: str = 'Actuals vs Prediction',
ax=None
)
```

```
This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x
Arguments:
data {typing.Iterable} -- The dataset to determine regression quality. It contains input names and output name of the model as columns
Keyword Arguments:
title {str} -- (default: {"Actuals vs Predictions"})
ax {AxesSubplot} -- (default: {None})
```

*method* SKLearnRegressor.plot_residuals

```
def plot_residuals(
self,
data: Iterable,
title: str = 'Residuals plot',
ax=None
)
```

```
This plots the predicted values against the residuals (y_true - y_pred).
Arguments:
y_true {typing.Iterable} -- True values
y_pred {typing.Iterable} -- Predicted values
Keyword Arguments:
title {str} -- (default: {"Residual plot"})
ax {[type]} -- (default: {None})
```

*method* SKLearnRegressor.plot_roc_curve

```
def plot_roc_curve(
self,
data: Iterable,
threshold: float = None,
title: str = 'ROC curve',
ax=None,
**kwargs
) -> None
```

```
Plot the model's ROC curve.
This is a shorthand for calling feyn.plots.plot_roc_curve.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
title -- Title of the plot.
ax -- matplotlib axes object to draw to, default None
**kwargs -- additional options to pass on to matplotlib
```

*method* SKLearnRegressor.plot_segmented_loss

```
def plot_segmented_loss(
self,
data: Iterable,
by: Union[str, NoneType] = None,
loss_function='squared_error',
title='Segmented Loss',
ax=None
) -> None
```

```
Plot the loss by segment of a dataset.
This plot is useful to evaluate how a model performs on different subsets of the data.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")
This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.
You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.
Arguments:
data -- The dataset to measure the loss on.
by -- The column in the dataset to segment by.
loss_function -- The loss function to compute for each segmnent,
title -- Title of the plot.
ax -- matplotlib axes object to draw to
```

*method* SKLearnRegressor.predict

```
def predict(
self,
X: Iterable
)
```

```
```

*method* SKLearnRegressor.r2_score

```
def r2_score(
self,
data: Iterable
)
```

```
Compute the model's r2 score on a data set
The r2 score for a regression model is defined as
1 - rss/tss
Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.
A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value
It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
r2 score for the predictions
```

*method* SKLearnRegressor.rmse

```
def rmse(
self,
data
)
```

```
Compute the model's root mean squared error on a data set.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
RMSE for the predictions
```

*method* SKLearnRegressor.roc_auc_score

```
def roc_auc_score(
self,
data: Iterable
)
```

```
Calculate the Area Under Curve (AUC) of the ROC curve.
A ROC curve depicts the ability of a binary classifier with varying threshold.
The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.
Arguments:
data -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
AUC score for the predictions
```

*method* SKLearnRegressor.squared_error

```
def squared_error(
self,
data: Iterable
)
```

```
Compute the model's squared error loss on the provided data.
This function is a shorthand that is equivalent to the following code:
> y_true = data[
```