Feyn Documentation

Feyn Documentation

  • Learn
  • Guides
  • Tutorials
  • API Reference
  • FAQ

›Essentials

Getting Started

  • Quick start
  • Using Feyn
  • Installation
  • What is the QLattice?

Essentials

  • Auto Run
  • Summary plot
  • Plot response
  • Splitting a dataset
  • Seeding a QLattice
  • Predicting with a model
  • Saving and loading models
  • Categorical features

Evaluate Regressors

  • Regression plot
  • Residuals plot

Evaluate Classifiers

  • ROC curve
  • Confusion matrix
  • Plot probability scores

Understand Your Models

  • Plot response 1D
  • Plot response 2D
  • Model signal
  • Segmented loss
  • Interactive flow

Primitive Operations

  • Using the primitives
  • Updating priors
  • Sample models
  • Fitting models
  • Pruning models
  • Visualise a model
  • Diverse models
  • Updating a QLattice
  • Validate data
  • Semantic types

Advanced

  • Converting a model to SymPy
  • Logging in Feyn
  • Setting themes
  • Saving a graph as an image
  • Using the query language
  • Estimating priors
  • Filtering models
  • Model parameters
  • Model complexity

Privacy & Commercial

  • Privacy
  • Community edition
  • Commercial use
  • Transition to Feyn 3.0

Splitting a dataset

by: Kevin Broløs
(Feyn version 3.1.0 or newer)


The split function found in feyn.tools is suitable for splitting your data into random subsets prior to training a Model. This practice can aid you in cross-validation of the results, by allowing you to evaluate the Model on a dataset it has not been trained on.

We support splitting into as many subsets as the dataset supports, as well as stratification according to one or more columns.

The data is always shuffled prior to splitting, regardless of whether you stratify or not.

Example

from feyn.tools import split
from pandas import DataFrame


data = DataFrame(
    {
        "A": [0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
        "B": [1, 1, 1, 0, 0, 0, 1, 1, 0, 0],
    }
)

train, test = split(data, ratio=[0.8, 0.2], stratify=['A'], random_state=42)

In the example above, our data set has 10 samples, and B contains 5 samples with values 1 and 5 samples with values 0.

After splitting, train will contain 8 samples, evenly distributed between the column B having the values 0 and 1 with 4 of each. test will contain 2 samples, also evenly distributed with B taking on values of 0 and 1.

We provide a random state to ensure we get the exact same split every time we run the function.

Usage

You can choose however many sets you would like, as well as their comparative sizes using the ratio parameter. The ratio list is normalised before splitting, so [1., 1.] results in a 50/50 split, [1., 1., 1.] in an equal 3-way split, etc.

For readability, you'll often want to choose sensible splits that sum to 1 (or 100), such as [0.75, 0.25] or [75, 25].

Stratification

Using the stratify parameter, you can stratify the splits according to one or more columns provided. This helps insure your target class in a classification problem is equally represented in your train and test sets, or to help balance columns you know are not evenly represented between sets.

This results in each subset having (as close as possible to) the same relative ratios of distinct values represented in those columns in the final subsets.

The function returns an error if you try to stratisfy in a way that results in not enough samples being able to go into each chosen subset of the ratio parameter.

The stratification takes care to also proportionally distribute NaN values among splits. This is particularly important for categorical NaN values when training with the QLattice, as they get assigned a weight and bucket just like any other category.

Note: Since it works using distinct values, this parameter works best for ordinal and categorical values, but makes little sense for continuous variables.


Parameters of feyn.tools.split

data

The data to split.

ratio

Default: [0.75, 0.25]

The size ratio of the resulting subsets.

stratify

Default: None.

The names of columns to stratify by.

random_state

Default: None.

The random state of the split (integer)

← Plot responseSeeding a QLattice →
  • Example
  • Usage
    • Stratification
  • Parameters of feyn.tools.split
    • data
    • ratio
    • stratify
    • random_state

Subscribe to get news about Feyn and the QLattice.

You can opt out at any time, and you can read our privacy policy here.

Copyright © 2024 Abzu.ai - Feyn license: CC BY-NC-ND 4.0
Feyn®, QGraph®, and the QLattice® are registered trademarks of Abzu®