feyn.tools · Feyn Documentation

Common helper functions that makes it easier to get started using the SDK.

function estimate_priors

def estimate_priors(
    df: pandas.core.frame.DataFrame,
    output_name: str,
    floor: float = 0.1
)

Computes prior probabilities for each input based on mutual information.
The prior probability of an input denotes the initial belief of its importance in predicting the output before fitting a model.
The higher the prior probability the more important the corresponding feature is believed to be.

Arguments:
    df {DataFrame} -- The dataframe to calculate priors for.
    output_name {str} -- The output to measure against.

Keyword Arguments:
    floor {float} -- The minimum value for the priors (default: {0.1}).

Returns:
    dict -- a dictionary of feature names and their computed priors.

function get_model_parameters

def get_model_parameters(
    model: feyn._model.Model,
    name: str
) -> pandas.core.frame.DataFrame

Given a model and the name of one of its input or output nodes,
get a pandas.DataFrame with the associated parameters. If the node
is categorical, the function returns the weight associated with each categorical
value. If the node is numerical, the function returns the scale, weight and
bias.

Arguments:
    model {feyn.Model} -- feyn Model of interest.
    name {str} -- Name of the input or output of interest.

Returns:
    pd.DataFrame -- DataFrame with the parameters.

function get_progress_label

def get_progress_label(
    epoch: int,
    epochs: int,
    elapsed_seconds: Optional[float] = None,
    model_count: Optional[int] = None
) -> str

Gives a label for use with feyn.show_model based on epochs, max_epochs, time elapsed and model_count

Arguments:
    epoch {int} -- The current epoch
    epochs {int} -- Total amount of epochs
    elapsed_seconds {Optional[float]} -- seconds elapsed so far
    model_count {Optional[int]} -- Models investigated so far

Returns:
    str -- A string label displaying progress

function get_sympy_substitutions

def get_sympy_substitutions(
    model: feyn._model.Model,
    sample: Union[pandas.core.series.Series, pandas.core.frame.DataFrame],
    symbolic_cat: bool = True
) -> Dict[str, Any]

Generates a value substitution dictionary that can be used to evaluate a sympy expression based on a feyn.Model.
Takes as arguments the model to generate for, and a single sample from a pd.DataFrame that contains the values to substitute.

Especially useful for models with categorical inputs that are otherwise cumbersome to evaluate since information is lost if using a symbolic categorical representation, or tedious to populate if using an expanded representation.

Example: Evaluation of the sympy expression using the `subs` or `evalf` functions with the dictionary as input.
>>> expr = model.sympify()
>>> subs_dict = get_sympy_substitutions(model, data.iloc[0])
>>> expr.evalf(subs=subs_dict)

Arguments:
    model {Model} -- The feyn Model the sympy expression was created from
    sample {Union[Series, DataFrame]} -- The sample to use in evaluation.

Keyword Arguments:
    symbolic_cat {bool} -- Whether the sympy expression uses a symbolic categorical input, or the categories as expanded values (default: {True})

Returns:
    Dict[str, Any] -- A dictionary containing the symbol names and values to substitute

function infer_available_threads

def infer_available_threads(
    
) -> int

Attempt to infer the amount of threads available. Will always leave one free for system use.

Returns:
    int -- thread count

function infer_output_stype

def infer_output_stype(
    kind: str,
    output_name: str,
    stypes: Optional[Dict[str, str]] = {}
)

function infer_stypes

def infer_stypes(
    df: pandas.core.frame.DataFrame,
    output_name: str,
    capture_warnings: bool = False
) -> Union[Dict[str, str], Tuple[Dict[str, str], List[feyn.tools._data._types.ColumnTypeMessage]]]

Infer the stypes of a dataframe based on the data itself.

Arguments:
    df {DataFrame} -- The DataFrame to infer types for.
    output_name {str} -- The name of the output used for training.

Keyword Arguments:
    capture_warnings {bool} -- Whether to log warnings directly (False) or return them as a list (True) (default: {False})

Returns:
    Union[Dict[str, str], Tuple[Dict[str, str], List[ColumnTypeMessage]]] -- The dictionary of stypes. Optionaly a list of warning messages if capture_warnings = True.

function kind_to_output_stype

def kind_to_output_stype(
    kind: str
) -> str

Parse model kind string (like "regression" or "classification") into an output spec for the QLattice.

function split

def split(
    data: pandas.core.frame.DataFrame,
    ratio: List[float] = [0.75, 0.25],
    stratify: List[str] = None,
    random_state: int = None
) -> List[pandas.core.frame.DataFrame]

Split datasets into randomized subsets.

This function is used to split a dataset into random subsets - typically training and test data.

The input dataset should be either a pandas DataFrames or a dictionary of numpy arrays. The ratio parameter controls how the data is split, and how many subsets it is split into. The ratio list is normalised before splitting, so [1., 1.] results in a 50/50 split, [1., 1., 1.] in an equal 3-way split, etc.

By providing a list of column names to the stratify parameter, you can also choose to stratify the splits according to one or more columns.

Example: Split data in the ratio 2:1 into train and test data
>>> train, test = feyn.tools.split(data, [2,1])

Example: Split data in to train, test and validation data. 80% training data and 10% validation and holdout data each
>>> train, validation, holdout = feyn.tools.split(data, [.8, .1, .1])

Arguments:
    data {DataFrame} -- The data to split.

Keyword Arguments:
    ratio {List[float]} -- The size ratio of the resulting subsets. (default: {[0.75, 0.25]})
    stratify {List[str]} -- The names of columns to stratify by. (default: {None})
    random_state {int} -- The random state of the split (integer) (default: {None})

Returns:
    List[DataFrame] -- A list of the subsets of the dataset.

Raises:
    ValueError -- If not enough samples remain in the data or a stratum of the data to split into the number of subsets required

Warnings:
    If the resulting subsets are not reasonably within the provided ratios, indicating a dataset that is too small or stratifications that has too few samples.

function sympify_model

def sympify_model(
    m: feyn._model.Model,
    signif: int = 6,
    symbolic_lr: bool = False,
    symbolic_cat: bool = True,
    include_weights: bool = True
) -> Any

Convert a feyn Model to a sympy expression.

Arguments:
    m {feyn.Model} -- the Model to convert

Keyword Arguments:
    signif {int} -- The number of significant digits to use for weights and biases in the converted expression (default: {6})
    symbolic_lr {bool} -- Whether to replace the logistic function with a symbol (logreg) (default: {False})
    symbolic_cat {bool} -- Whether to use symbols to represent categorical inputs, or expand their values into the expression (default: {True})
    include_weights {bool} -- Whether to include weights and biases in the expression, or return just the symbolic form (default: {True})

Returns:
    Any -- A sympy expression for the Model.

function estimate_priors

function get_model_parameters

function get_progress_label

function get_sympy_substitutions

function infer_available_threads

function infer_output_stype

function infer_stypes

function kind_to_output_stype

function split

function sympify_model

Subscribe to get news about Feyn and the QLattice.