Feyn is the main Python module to build and execute models that utilizes a QLattice.
The QLattice stores and updates probabilistic information about the mathematical relationships (models) between observable quantities.
The workflow is typically:
# Instantiate a QLattice
>>> ql = feyn.QLattice()
# Sample models from the QLattice
>>> models = ql.sample_models(data.columsn, output="out")
# Fit the list of models to a dataset
>>> models = feyn.fit_models(models, data)
# Pick the best Model from the fitted models
>>> best = models[0]
# Update the QLattice with this model to explore similar models.
>>> ql.update(best)
# Or use the model to make predictions
>>> predicted_y = model.predict(new_data)
Fit a list of models on some data and return a list of fitted models. The return list will be sorted in ascending order by either the loss function or one of the criteria.
The n_samples parameter controls how many samples are used to train each model. The default behavior is to fit each model once with each sample in the dataset, unless the set is smaller than 10000, in which case the dataset will be upsampled to 10000 samples before fitting.
The samples are shuffled randomly before fitting to avoid issues with the Stochastic Gradient Descent algorithm.
Arguments:
models {List[feyn.Model]} -- A list of feyn models to be fitted.
data {[type]} -- Data used in fitting each model.
Keyword Arguments:
loss_function {Optional[str]} -- The loss function to optimize models for. Can be "squared_error", "absolute_error" or "binary_cross_entropy"
criterion {Optional[str]} -- Sort by information criterion rather than loss. Either "aic", "bic" or None (loss). (default: {None})
n_samples {Optional[int]} -- The number of samples to fit each model with. (default: {None})
sample_weights {Optional[Iterable[float]]} -- An optional numpy array of weights for each sample. If present, the array must have the same size as the data set, i.e. one weight for each sample. (default: {None})
threads {int} -- Number of concurrent threads to use for fitting. (default: {4})
immutable {bool} -- If True, create a copy of each model and fit those, leaving the originals untouched. This increases runtime. (default: {False})
Raises:
TypeError: if inputs don't match the correct type.
ValueError: if there are no samples
ValueError: if data and sample_weights is not same size
ValueError: if the loss function is unknown.
ValueError: if inputs contain a mix of classifiers and regressors
Returns:
List[feyn.Model] -- A list of fitted feyn models.
Select at most n best performing models from a collection, such that they are sufficiently diverse in their lineage.
Arguments:
models {List[feyn.Model]} -- The list of models to find the best ones in.
Keyword Arguments:
n {int} -- The maximum number of best models to identify. (default: {10})
Raises:
TypeError: if inputs don't match the correct type.
Returns:
List[feyn.Model] -- The best sufficiently diverse models under distance_func.
Prune a list of models to remove redundant and poorly performing ones.
Arguments:
models {List[feyn.Model]} -- The list of models to prune.
Keyword Arguments:
keep_n {Optional[int]} -- At most this many models will be returned. If None, models are left to be pruned by other redundancies. (default: {None})
Raises:
TypeError: if inputs don't match the correct type.
Returns:
List[feyn.Model] -- The list of pruned models.
Updates the display in a python notebook with the graph representation of a model
Arguments:
model {Model} -- The model to display.
Keyword Arguments:
label {Optional[str]} -- A label to add to the rendering of the model (default is None).
update_display {bool} -- Clear output and rerender figure (defaults to False).
filename {Optional[str]} -- The filename to use for saving the plot as html (defaults to None).
show_sources {bool} -- Whether to show the ordering of the sources in the model - for debug purposes (defaults to False).
Validates a pandas dataframe for known data issues.
Arguments:
data {pd.DataFrame} -- The data to validate
kind {str} -- The kind of output - classification or regression
output_name {str} -- The name of the output
Keyword Arguments:
stypes {Optional[Dict[str, str]]} -- The stypes you want to assign to your inputs (default: {None})
Raises:
ValueError: When output values do not match output type
ValueError: When categorical stypes are not defined for categorical inputs
ValueError: When nan values exist for numerical inputs
A convenience function for running the QLattice simulator for many epochs. This process can be interrupted with a KeyboardInterrupt, and you will get back the best models that have been found thus far. Roughly equivalent to the following:
>>> priors = feyn.tools.estimate_priors(data, output_name)
>>> ql.update_priors(priors)
>>> models = []
>>> for i in range(n_epochs):
>>> models += ql.sample_models(data, output_name, kind, stypes, max_complexity, query_string, function_names)
>>> models = feyn.fit_models(models, data, loss_function, criterion, None, sample_weights)
>>> models = feyn.prune_models(models)
>>> ql.update(models)
>>> best = feyn.get_diverse_models(models, n=10)
Arguments:
data {Iterable} -- The data to train models on. Input names are inferred from the columns (pd.DataFrame) or keys (dict) of this variable.
output_name {str} -- The name of the output.
Keyword Arguments:
kind {str} -- Specify the kind of models that are sampled. One of ["classification", "regression"]. (default: {"regression"})
stypes {Optional[Dict[str, str]]} -- An optional map from input names to semantic types. (default: {None})
n_epochs {int} -- Number of training epochs. (default: {10})
threads {int} -- Number of concurrent threads to use for fitting. If a number, that many threads are used. If "auto", set to your CPU count - 1. (default: {"auto"})
max_complexity {int} -- The maximum complexity for sampled models. (default: {10})
query_string {Optional[str]} -- An optional query string for specifying specific model structures. (default: {None})
loss_function {Optional[Union[str, Callable]]} -- The loss function to optimize models for. If None (default), 'MSE' is chosen for regression problems and 'binary_cross_entropy' for classification problems. (default: {None})
criterion {Optional[str]} -- Sort by information criterion rather than loss. Either "aic", "bic" or None (loss). (default: {"bic"})
sample_weights {Optional[Iterable[float]]} -- An optional numpy array of weights for each sample. If present, the array must have the same size as the data set, i.e. one weight for each sample. (default: {None})
function_names {Optional[List[str]]} -- A list of function names to use in the QLattice simulation. Defaults to all available functions being used. (default: {None})
starting_models {Optional[List[feyn.Model]]} -- A list of preexisting feyn models you would like to start finding better models from. The inputs and output of these models should match the other arguments to this function. (default: {None})
Raises:
TypeError: if inputs don't match the correct type.
Returns:
List[feyn.Model] -- The best models found during this run.
method QLattice.reset
defreset(
self,
random_seed=-1)
Deprecated. Create a new QLattice with the constructor instead.
Clear all learnings in this QLattice.
Keyword Arguments:
random_seed {int} -- If not -1, seed the qlattice and feyn random number generator to get reproducible results. (default: {-1})
Sample models from the QLattice simulator. The QLattice has a probability density for generating different models, and this function samples from that density.
Arguments:
input_names {List[str]} -- The names of the inputs.
output_name {str} -- The name of the output.
Keyword Arguments:
kind {str} -- Specify the kind of models that are sampled. One of ["classification", "regression"]. (default: {"regression"})
stypes {Optional[Dict[str, str]]} -- An optional map from input names to semantic types. (default: {None})
max_complexity {int} -- The maximum complexity for sampled models. Currently the maximum number of edges that the graph representation of the models has. (default: {10})
query_string {Optional[str]} -- An optional query string for specifying specific model structures. (default: {None})
function_names {Optional[List[str]]} -- A list of function names to use in the QLattice simulation. Defaults to all available functions being used. (default: {None})
Raises:
TypeError: if inputs don't match the correct type.
ValueError: if input_names contains duplicates.
ValueError: if max_complexity is negative.
ValueError: if kind is not a regressor or classifier.
ValueError: if function_names is not recognised.
ValueError: if query_string is invalid.
Returns:
List[Model] -- The list of sampled models.
Update QLattice with learnings from a list of models. When updated, the QLattice learns to produce models that are similar to what is included in the update. Without updating, the QLattice will keep generating models with a random structure.
Arguments:
models {Union[Model, Iterable[Model]]} -- The models to use in a QLattice update.
Raises:
TypeError: if inputs don't match the correct type.
Update input priors for the QLattice
Keyword Arguments:
priors - a dictionary of prior probabilities of each input to impact the output.
reset - a boolean determining whether to reset the current priors, or merge with the existing priors.
class Model
def__init__(
program,
fnames,
params=None)-> Model
A Model represents a single mathematical equation which can be used for predicting.
The constructor is for internal use.
property Model.depth
depth
Get the depth of the graph representation of the model. In general, it is better to evaluate the complexity of models using the edge_count (or max_complexity) properties
property Model.edge_count
edge_count
Get the total number of edges in the graph representation of this model.
property Model.features
features
Get the name of the input features of the model. Does the same as 'inputs'
property Model.inputs
inputs
Get the name of the input features of the model.
property Model.kind
kind
None
property Model.output
output
Get the name of the output node.
property Model.target
target
Get the name of the output node. Does the same as 'output'
Load a `Model` from a file.
Usually used together with `Model.save`.
Arguments:
file -- A file-like object or a path to load the `Model` from.
Returns:
Model -- The loaded `Model`-object.
Compute the model's accuracy score on a data set.
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Formally it is defned as
(number of correct predictions) / (total number of preditions)
Arguments:
data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
accuracy score for the predictions
Raises:
TypeError -- if inputs don't match the correct type.
Compute the accuracy score of predictions with optimal threshold
The accuracy score is useful to evaluate classification models. It is the fraction of the preditions that are correct. Accuracy is normally calculated under the assumption that the threshold that separates true from false is 0.5. Hovever, this is not the case when a model was trained with another population composition than on the one which is used.
This function first computes the threshold limining true from false classes that optimises the accuracy. It then returns this threshold along with the accuracy that is obtained using it.
Arguments:
data {DataFrame} -- Dataset to evaulate accuracy and accuracy threshold
Returns a tuple with:
threshold that maximizes accuracy
accuracy score obtained with this threshold
Raises:
TypeError -- if inputs don't match the correct type.
TypeError -- if model is not a classification model.
Fit this specific `Model` with the given data set.
Arguments:
data -- Training data including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
loss_function -- Name of the loss function or the function itself. This is the loss function to use for fitting. Can either be a string or one of the functions provided in `feyn.losses`.
sample_weights -- An optional numpy array of weights for each sample. If present, the array must have the same size as the data set, i.e. one weight for each sample
method Model.get_parameters
defget_parameters(
self,
name:str)
Given a model and the name of one of its input or output nodes,
get a pandas.DataFrame with the associated parameters. If the node
is categorical, the function returns the weight associated with each categorical
value. If the node is numerical, the function returns the scale, weight and
bias.
Arguments:
name {str} -- Name of the input or output of interest.
Returns:
pd.DataFrame -- DataFrame with the parameters.
method Model.mae
defmae(
self,
data: pandas.core.frame.DataFrame
)
Compute the model's mean absolute error on a data set.
Arguments:
data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MAE for the predictions
Raises:
TypeError -- if inputs don't match the correct type.
method Model.mse
defmse(
self,
data: pandas.core.frame.DataFrame
)
Compute the model's mean squared error on a data set.
Arguments:
data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
MSE for the predictions
Raises:
TypeError -- if inputs don't match the correct type.
Plot the model's summary metrics and some useful plots for its kind.
This is a shorthand for calling feyn.plots.plot_model_summary.
Arguments:
data {DataFrame} -- Data set including both input and expected values.
Keyword Arguments:
compare_data {Optional[Union[DataFrame, List[DataFrame]]]} -- Additional data set(s) including both input and expected values. (default: {None})
labels {Optional[Iterable[str]]} - A list of labels to use instead of the default labels. Must be size 2 if using comparison dataset, else 1.
filename {Optional[str]} - The filename to use for saving the plot as html.
Raises:
TypeError: if inputs don't match the correct type.
ValueError: If columns needed for the model are not present in the data.
Returns:
HTML -- HTML report of the model summary.
Compute and plot a Confusion Matrix.
Arguments:
data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Boundary of True and False predictions, default 0.5
labels -- List of labels to index the matrix
title -- Title of the plot.
color_map -- Color map from matplotlib to use for the matrix
ax -- matplotlib axes object to draw to, default None
figsize -- Size of created figure, default None
filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None
Raises:
TypeError -- if inputs don't match the correct type.
TypeError -- if model is not a classification model.
Plots the flow of activations through the model, for the provided sample. Uses the provided data as background information for visualization.
Arguments:
data {DataFrame} -- Data set including both input and expected values.
sample {Union[DataFrame, Series]} -- A single data sample to plot the activations for.
filename {Optional[str]} - The filename to use for saving the plot as svg.
Raises:
TypeError: if inputs don't match the correct type.
ValueError: If columns needed for the model are not present in the data.
Returns:
SVG -- SVG object containing the SVG of the model activation flow.
Plot the model's precision-recall curve.
This is a shorthand for calling feyn.plots.plot_pr_curve.
Arguments:
data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Plots a point on the PR curve of the precision and recall at the given threshold. Default is None
title -- Title of the plot.
ax -- matplotlib axes object to draw to, default None
figsize -- size of figure when is None, default None
filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
**kwargs -- additional keyword arguments to pass to Axes.plot function
Raises:
TypeError -- if inputs don't match the correct type.
TypeError -- if model is not a classification model.
Plots the histogram of probability scores in binary
classification problems, highlighting the negative and
positive classes. Order of truth and prediction matters.
Arguments:
data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Keyword Arguments:
nbins {int} -- number of bins (default: {10})
title {str} -- plot title (default: {''})
legend {List[str]} -- legend to use on the plot for the positive and negative class (default: ["Positive Class", "Negative Class"])
legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
ax {Axes} -- axes object (default: {None})
figsize {tuple} -- size of figure (default: {None})
filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
kwargs {dict} -- histogram kwargs (default: {None})
Raises:
TypeError -- if model is not a classification model.
TypeError -- if inputs don't match the correct type.
ValueError: if y_true is not bool-like (boolean or 0/1).
ValueError: if y_pred is not bool-like (boolean or 0/1).
ValueError: if y_pred and y_true are not same size.
ValueError: If fewer than two labels are supplied for the legend.
This plots the true values on the x-axis and the predicted values on the y-axis.
On top of the plot is the line of equality y=x.
The closer the scattered values are to the line the better the predictions.
The line of best fit between y_true and y_pred is also calculated and plotted. This line should be close to the line y=x
Arguments:
data {DataFrame} -- The dataset to determine regression quality. It contains input names and output name of the model as columns
Keyword Arguments:
title {str} -- (default: {"Actuals vs Predictions"})
ax {AxesSubplot} -- (default: {None})
figsize {tuple} -- Size of figure (default: {None})
filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
Raises:
TypeError -- if inputs don't match the correct type.
This plots the predicted values against the residuals (y_true - y_pred).
Arguments:
data {DataFrame} -- The dataset containing the samples to determine the residuals of.
Keyword Arguments:
title {str} -- (default: {"Residual plot"})
ax {Axes} -- (default: {None})
figsize {tuple} -- Size of figure (default: {None})
filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
Raises:
TypeError -- if inputs don't match the correct type.
Plot the response of a model to a single input given by `by`.
The remaining model inputs are fixed by default as the middle
quantile (median). Additional quantiles are added if the model has
a maximum of 3 inputs. You can change this behavior by determining
`input_contraints` yourself. Any number of model inputs can be added to it.
Arguments:
data {DataFrame} -- The dataset to plot on.
by {str} -- Model input to plot model response by.
Keyword Arguments:
input_contraints {Optional[dict]} -- Input values to be fixed (default: {None}).
ax {Optional[matplotlib.axes]} -- matplotlib axes object to draw to (default: {None}).
figsize {tuple} -- size of created figure (default: {(8,8)})
filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
Raises:
TypeError: if function parameters don't match the correct type.
ValueError: if by is not in the columns of data or inputs to the model.
ValueError: if by is also in input_constraints.
ValueError: if input_constraints contains an input that is not in data.
ValueError: if model.output is not in data.
Visualize the response of a model to numerical inputs. Works for both classification and regression problems. The plot comes in two parts:
1. A colored background indicating the response of the model in a 2D space given the fixed values. A lighter color corresponds to a bigger output from the model.
2. Scatter-plotted data on top of the background. In a classification scenario, green corresponds to positive class, and pink corresponds to the negative class. For regression, the color gradient shows the true distribution of the output value. Two sizes are used in the scatterplot, the larger dots correspond to the data that matches the values in fixed and the smaller ones have data different from the values in fixed.
Arguments:
model {feyn.Model} -- The feyn Model we want a partial plot of.
data {DataFrame} -- The data that will be scattered in the model.
Keyword Arguments:
fixed {Optional[Dict[str, Any]]} -- Dictionary with values we fix in the model. The key is an input name in the model and the value is a number that the input is fixed to. (default: {None})
ax {Optional[plt.Axes.axes]} -- Optional matplotlib axes in which to make the partial plot. (default: {None})
resolution {int} -- The resolution at which we sample the 2D input space for the background. (default: {1000})
figsize {Optional[tuple]} -- Size of created figure if no matplotlib axes is passed in ax. (default: {None})
filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
Raises:
TypeError: if function parameters don't match the correct type.
ValueError: if the model input names minus the fixed value names are more than two, meaning that you need to fix more values to reduce the dimensionality and make a 2D plot possible.
ValueError: if fixed contains an input not in the model inputs.
ValueError: If columns needed for the model are not present in the data.
Automatically visualize the response of a model to numerical inputs.
This function attempts to automatically determine the most interesting inputs to display and fixes the rest to the median if numeric or mode if categorical.
It also automatically decided whether to plot a 1d or 2d response plot.
Uses the functions `plot_model_response_1d` or `plot_model_response_2d` internally depending on number of inputs in the model.
For the 2D plot, the following applies:
1. A colored background indicating the response of the model in a 2D space given the fixed values. A lighter color corresponds to a bigger output from the model.
2. Scatter-plotted data on top of the background. In a classification scenario, green corresponds to positive class, and pink corresponds to the negative class. For regression, the color gradient shows the true distribution of the output value. Two sizes are used in the scatterplot, the larger dots correspond to the data that matches the values in fixed and the smaller ones have data different from the values in fixed.
Arguments:
model {feyn.Model} -- The feyn Model we want a partial plot of.
data {DataFrame} -- The data that will be scattered in the model.
Keyword Arguments:
fixed {Optional[Dict[str, Any]]} -- Dictionary with values we fix in the model. The key is an input name in the model and the value is a number that the input is fixed to. (default: {None})
ax {Optional[plt.Axes.axes]} -- Optional matplotlib axes in which to make the partial plot. (default: {None})
resolution {int} -- The resolution at which we sample the 2D input space for the background. (default: {1000})
figsize {Optional[tuple]} -- Size of created figure if no matplotlib axes is passed in ax. (default: {None})
filename {str} -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used (default: {None})
Raises:
TypeError: if inputs don't match the correct type.
Plot the model's ROC curve.
This is a shorthand for calling feyn.plots.plot_roc_curve.
Arguments:
data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
threshold -- Plots a point on the ROC curve of the true positive rate and false positive rate at the given threshold. Default is None
title -- Title of the plot.
ax -- matplotlib axes object to draw to, default None
figsize -- size of figure when is None, default None
filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default is None
**kwargs -- additional keyword arguments to pass to Axes.plot function
Raises:
TypeError -- if inputs don't match the correct type.
TypeError -- if model is not a classification model.
method Model.plot_segmented_loss
defplot_segmented_loss(
self,
data: pandas.core.frame.DataFrame,
by: Optional[str]=None,
loss_function:str='squared_error',
title:str='Segmented Loss',
legend: List[str]=['Samples in bin','Mean loss for bin'],
legend_loc: Optional[str]='lower right',
ax: Optional[matplotlib.axes._axes.Axes]=None,
figsize: Optional[tuple]=None,
filename: Optional[str]=None)->None
Plot the loss by segment of a dataset.
This plot is useful to evaluate how a model performs on different subsets of the data.
Example:
> models = qlattice.sample_models(["age","smoker","heartrate"], output="heartrate")
> models = feyn.fit_models(models, data)
> best = models[0]
> feyn.plots.plot_segmented_loss(best, data, by="smoker")
This will plot a histogram of the model loss for smokers and non-smokers separately, which can help evaluate wheter the model has better performance for euther of the smoker sub-populations.
You can use any column in the dataset as the `by` parameter. If you use a numerical column, the data will be binned automatically.
Arguments:
data {DataFrame} -- The dataset to measure the loss on.
Keyword Arguments:
by -- The column in the dataset to segment by.
loss_function -- The loss function to compute for each segmnent,
title -- Title of the plot.
legend {List[str]} -- legend to use on the plot for bins and loss line (default: ["Samples in bin", "Mean loss for bin"])
legend_loc {str} -- the location (mpl style) to use for the label. If None, legend is hidden
ax -- matplotlib axes object to draw to
figsize -- Size of created figure, default None
filename -- Path to save plot. If axes is passed then only plot is saved. If no extension is given then .png is used, default None
Raises:
TypeError -- if inputs don't match the correct type.
ValueError: if by is not in data.
ValueError: If columns needed for the model are not present in the data.
ValueError: If fewer than two labels are supplied for the legend.
Plot a model displaying the signal path for the provided feyn.Model and DataFrame.
Arguments:
dataframe {DataFrame} -- A Pandas DataFrame for showing metrics.
Keyword Arguments:
corr_func {Optional[str]} -- A name for the correlation function to use as the node signal, either 'mutual_information', 'pearson' or 'spearman' are available. (default: {None} defaults to 'pearson')
filename {Optional[str]} - The filename to use for saving the plot as svg.
Raises:
TypeError: if function parameters don't match the correct type.
ValueError: if the name of the correlation function is not understood.
ValueError: if invalid dataframes are passed.
ValueError: If columns needed for the model are not present in the data.
Returns:
SVG -- SVG of the model signal.
Calculate predictions based on input values. Note that for classification tasks the output are probabilities.
>>> model.predict({ "age": [34, 78], "sex": ["male", "female"] })
[0.85, 0.21]
Arguments:
X {DataFrame} -- The input values as a pandas.DataFrame.
Returns:
np.ndarray -- The calculated predictions.
Compute the model's r2 score on a data set
The r2 score for a regression model is defined as
1 - rss/tss
Where rss is the residual sum of squares for the predictions, and tss is the total sum of squares.
Intutively, the tss is the resuduals of a so-called "worst" model that always predicts the mean. Therefore, the r2 score expresses how much better the predictions are than such a model.
A result of 0 means that the model is no better than a model that always predicts the mean value
A result of 1 means that the model perfectly predicts the true value
It is possible to get r2 scores below 0 if the predictions are even worse than the mean model.
Arguments:
data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
r2 score for the predictions
Raises:
TypeError -- if inputs don't match the correct type.
Compute the model's root mean squared error on a data set.
Arguments:
data {DataFrame}-- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
RMSE for the predictions
Raises:
TypeError -- if inputs don't match the correct type.
Calculate the Area Under Curve (AUC) of the ROC curve.
A ROC curve depicts the ability of a binary classifier with varying threshold.
The area under the curve (AUC) is the probability that said classifier will
attach a higher score to a random positive instance in comparison to a random
negative instance.
Arguments:
data {DataFrame} -- Data set including both input and expected values. Can be either a dict mapping register names to value arrays, or a pandas.DataFrame.
Returns:
AUC score for the predictions
Raises:
TypeError -- if inputs don't match the correct type.
TypeError -- if model is not a classification model.
Save the `Model` to a file-like object.
The file can later be used to recreate the `Model` with `Model.load`.
Arguments:
file -- A file-like object or path to save the model to.
method Model.savefig
defsavefig(
self,
filename:str)->str
Save model as an svg file.
Args:
filename (str): the filename of the file to save. Includes the filepath and file extension.
Updates the display in a python notebook with the graph representation of a model
Keyword Arguments:
label {Optional[str]} -- A label to add to the rendering of the model (default is None).
update_display {bool} -- Clear output and rerender figure (defaults to False).
filename {Optional[str]} -- The filename to use for saving the plot as html (defaults to None).
Convert the model to a sympy expression.
This function requires sympy to be installed.
Arguments:
signif -- the number of significant digits in the parameters of the model
symbolic_lr -- express logistic regression wrapper as part of the expression
Returns:
expression -- a sympy expression
method Model.to_query_string
defto_query_string(
self
)
Returns the query string representation for the given model.