ROC curve
by: Chris Cave
(Feyn version 3.0 or newer)
There are many different metrics to evaluate a classification model: accuracy, precision, recall, F1 score etc. Each of these metrics require fixing a decision boundary. Above this boundary, the sample will be classified as True
and below the sample will be classified as False
. Typically this boundary is called a threshold and is set at 0.5.
However one can imagine situations that you want a classifier that does not predict a positive class unless it is very sure. Here you would increase the threshold from 0.5 to something much higher say 0.8. Then the classifier would only predict positives when it was nearly certain but this comes with cost that the classifier will miss some positives.
The receiver operating characteristic (ROC) captures how the classifier behaves with different thresholds. This gives a good indication where to make the best trade offs.
import feyn
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
# Load into a pandas dataframe
breast_cancer = load_breast_cancer(as_frame=True)
data = breast_cancer.frame
# Train/test split
train, test = train_test_split(data, test_size=0.4, stratify=data['target'], random_state=666)
ql = feyn.QLattice()
models = ql.auto_run(
data=train,
output_name = 'target',
kind='classification'
)
best = models[0]
ROC curve
best.plot_roc_curve(train)
ROC curve with threshold
We can plot a particular threshold on the curve and find out the metrics of the classifier (accurarcy, F1 score, Precision and recall) at this threshold
best.plot_roc_curve(train, threshold=0.5)
Saving the plot
You can save the plot using the filename
parameter. The plot is saved in the current working directory unless another path specifed.
best.plot_roc_curve(data=train, filename="feyn-plot")
If the extension is not specified then it is saved as a png file.
Feyn
Location in This function can also be found in feyn.plots
module.
from feyn.plots import plot_roc_curve
y_true = train['target']
y_pred = best.predict(train)
plot_roc_curve(y_true, y_pred)