Wine Quality
by: Meera Machado & Chris Cave
Feyn version: 2.1.+
Last updated: 23/09/2021
import pandas as pd
import feyn
Importing the dataset
We will run an analysis on the Wine Quality dataset from the UCI machine learning repository. We will try to predict alcohol
levels from the other features.
data = pd.read_csv('../data/wine_quality.csv')
data.head()
fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | quality | color | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7.7 | 0.29 | 0.29 | 4.8 | 0.060 | 27.0 | 156.0 | 0.99572 | 3.49 | 0.59 | 10.3 | 6 | white |
1 | 6.2 | 0.47 | 0.19 | 8.3 | 0.029 | 24.0 | 142.0 | 0.99200 | 3.22 | 0.45 | 12.3 | 6 | white |
2 | 10.3 | 0.27 | 0.24 | 2.1 | 0.072 | 15.0 | 33.0 | 0.99560 | 3.22 | 0.66 | 12.8 | 6 | red |
3 | 6.3 | 0.37 | 0.28 | 6.3 | 0.034 | 45.0 | 152.0 | 0.99210 | 3.29 | 0.46 | 11.6 | 7 | white |
4 | 8.0 | 0.13 | 0.25 | 1.1 | 0.033 | 15.0 | 86.0 | 0.99044 | 2.98 | 0.39 | 11.2 | 8 | white |
Training session
random_seed = 42
# Train/test/hold out split
train, test = feyn.tools.split(data, ratio=[2, 1], random_state=random_seed)
Connecting to QLattice
# Connect to QLattice
ql = feyn.connect_qlattice()
# Reset and set a seed
ql.reset(random_seed=random_seed)
Sample and fit models using the primitive operations
Here is where the method auto_run
is broken down into its primitive operations. This allows for a more customizable workflow.
# Setting semantic types
stypes = {'color': 'c'}
# Set number of epochs
n_epochs = 20
# Initialize the list of models
models = []
# Sample and fit
for epoch in range(n_epochs):
# Sample models (no data here yet)
models += ql.sample_models(
input_names=train.columns,
output_name='alcohol',
kind='regression',
stypes=stypes,
max_complexity=10
)
# Fit the models with train data
models = feyn.fit_models(models, train, loss_function='squared_error')
# Remove redundant and worst performing models
models = feyn.prune_models(models)
# Display best model of each epoch
feyn.show_model(models[0], label=f"Epoch: {epoch}", update_display=True)
# Update QLattice with the models sorted by loss
ql.update(models)
# Find the 10 best diverse models
best_models = feyn.get_diverse_models(models, n=10)
best_model = best_models[0]
Model inspection
Here we evaluate model performance feyn.Model.plot
.
# Summary plot of the best model
best_model.plot(train, test)