by: Chris Cave
(Feyn version 2.0 or newer)
There are many different metrics to evaluate a classification model: accuracy, precision, recall, F1 score etc. Each of these metrics require fixing a decision boundary. Above this boundary, the sample will be classified as
True and below the sample will be classified as
False. Typically this boundary is called a threshold and is set at 0.5.
However one can imagine situations that you want a classifier that does not predict a positive class unless it is very sure. Here you would increase the threshold from 0.5 to something much higher say 0.8. Then the classifier would only predict positives when it was nearly certain but this comes with cost that the classifier will miss some positives.
The receiver operating characteristic (ROC) captures how the classifier behaves with different thresholds. This gives a good indication where to make the best trade offs.
import feyn import pandas as pd from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split # Load into a pandas dataframe breast_cancer = load_breast_cancer(as_frame=True) data = breast_cancer.frame # Train/test split train, test = train_test_split(data, test_size=0.4, stratify=data['target'], random_state=666) ql = feyn.connect_qlattice() models = ql.auto_run( data=train, output_name = 'target', kind='classification' ) best = models
ROC curve with threshold
We can plot a particular threshold on the curve and find out the metrics of the classifier (accurarcy, F1 score, Precision and recall) at this threshold
This function can also be found in
from feyn.plots import plot_roc_curve y_true = train['target'] y_pred = best.predict(train) plot_roc_curve(y_true, y_pred)