Computes prior probabilities for each input based on mutual information.
The prior probability of an input denotes the initial belief of its importance in predicting the output before fitting a model.
The higher the prior probability the more important the corresponding feature is believed to be.
Arguments:
df {DataFrame} -- The dataframe to calculate priors for.
output_name {str} -- The output to measure against.
Keyword Arguments:
floor {float} -- The minimum value for the priors (default: {0.1}).
Returns:
dict -- a dictionary of feature names and their computed priors.
Given a model and the name of one of its input or output nodes,
get a pandas.DataFrame with the associated parameters. If the node
is categorical, the function returns the weight associated with each categorical
value. If the node is numerical, the function returns the scale, weight and
bias.
Arguments:
model {feyn.Model} -- feyn Model of interest.
name {str} -- Name of the input or output of interest.
Returns:
pd.DataFrame -- DataFrame with the parameters.
Gives a label for use with feyn.show_model based on epochs, max_epochs, time elapsed and model_count
Arguments:
epoch {int} -- The current epoch
epochs {int} -- Total amount of epochs
elapsed_seconds {Optional[float]} -- seconds elapsed so far
model_count {Optional[int]} -- Models investigated so far
Returns:
str -- A string label displaying progress
Generates a value substitution dictionary that can be used to evaluate a sympy expression based on a feyn.Model.
Takes as arguments the model to generate for, and a single sample from a pd.DataFrame that contains the values to substitute.
Especially useful for models with categorical inputs that are otherwise cumbersome to evaluate since information is lost if using a symbolic categorical representation, or tedious to populate if using an expanded representation.
Example: Evaluation of the sympy expression using the `subs` or `evalf` functions with the dictionary as input.
>>> expr = model.sympify()
>>> subs_dict = get_sympy_substitutions(model, data.iloc[0])
>>> expr.evalf(subs=subs_dict)
Arguments:
model {Model} -- The feyn Model the sympy expression was created from
sample {Union[Series, DataFrame]} -- The sample to use in evaluation.
Keyword Arguments:
symbolic_cat {bool} -- Whether the sympy expression uses a symbolic categorical input, or the categories as expanded values (default: {True})
Returns:
Dict[str, Any] -- A dictionary containing the symbol names and values to substitute
function infer_available_threads
definfer_available_threads()->int
Attempt to infer the amount of threads available. Will always leave one free for system use.
Returns:
int -- thread count
Infer the stypes of a dataframe based on the data itself.
Arguments:
df {DataFrame} -- The DataFrame to infer types for.
output_name {str} -- The name of the output used for training.
Keyword Arguments:
capture_warnings {bool} -- Whether to log warnings directly (False) or return them as a list (True) (default: {False})
Returns:
Union[Dict[str, str], Tuple[Dict[str, str], List[ColumnTypeMessage]]] -- The dictionary of stypes. Optionaly a list of warning messages if capture_warnings = True.
function kind_to_output_stype
defkind_to_output_stype(
kind:str)->str
Parse model kind string (like "regression" or "classification") into an output spec for the QLattice.
Split datasets into randomized subsets.
This function is used to split a dataset into random subsets - typically training and test data.
The input dataset should be either a pandas DataFrames or a dictionary of numpy arrays. The ratio parameter controls how the data is split, and how many subsets it is split into. The ratio list is normalised before splitting, so [1., 1.] results in a 50/50 split, [1., 1., 1.] in an equal 3-way split, etc.
By providing a list of column names to the stratify parameter, you can also choose to stratify the splits according to one or more columns.
Example: Split data in the ratio 2:1 into train and test data
>>> train, test = feyn.tools.split(data, [2,1])
Example: Split data in to train, test and validation data. 80% training data and 10% validation and holdout data each
>>> train, validation, holdout = feyn.tools.split(data, [.8, .1, .1])
Arguments:
data {DataFrame} -- The data to split.
Keyword Arguments:
ratio {List[float]} -- The size ratio of the resulting subsets. (default: {[0.75, 0.25]})
stratify {List[str]} -- The names of columns to stratify by. (default: {None})
random_state {int} -- The random state of the split (integer) (default: {None})
Returns:
List[DataFrame] -- A list of the subsets of the dataset.
Raises:
ValueError -- If not enough samples remain in the data or a stratum of the data to split into the number of subsets required
Warnings:
If the resulting subsets are not reasonably within the provided ratios, indicating a dataset that is too small or stratifications that has too few samples.
function sympify_model
defsympify_model(
m: feyn._model.Model,
signif:int=6,
symbolic_lr:bool=False,
symbolic_cat:bool=True,
include_weights:bool=True)-> Any
Convert a feyn Model to a sympy expression.
Arguments:
m {feyn.Model} -- the Model to convert
Keyword Arguments:
signif {int} -- The number of significant digits to use for weights and biases in the converted expression (default: {6})
symbolic_lr {bool} -- Whether to replace the logistic function with a symbol (logreg) (default: {False})
symbolic_cat {bool} -- Whether to use symbols to represent categorical inputs, or expand their values into the expression (default: {True})
include_weights {bool} -- Whether to include weights and biases in the expression, or return just the symbolic form (default: {True})
Returns:
Any -- A sympy expression for the Model.