Using the query language · Feyn Documentation

by: Jaan Kasak
(Feyn version 3.0 or newer)

In using the QLattice to extract learnings from your data, the process works best when you have a specific question in mind. This question can be as as simple as "Which other feature does feature a best interact with linearly?" or as complex and specific as "I know that when both features a and b have specific value, they contribute strongly to the output, but do there exist any other signal paths that interact with this specific joint state for a and b?".

The QLattice will find all statistical correlations in your data, but when there is a cause/effect relationship between your target and feature variables is when asking questions using the QLattice truly shines. Other ML methods rely on feature engineering and separating data sets to support this type of workflow, but since we provide mathematical equations, we can offer a neat query language to ask for models with some specific underlying equation.

Queries are written in the query_string fields on the methods QLattice.sample_models and QLattice.auto_run. As a spoiler, the two questions above translate to these queries.

Example

import feyn

ql = feyn.QLattice()

feature_names = ["a", "b", "c", "d", "e"]
models = ql.sample_models(feature_names, "y", query_string="'a' + ?")
for m in models[:3]:
    feyn.show_model(m)

demo0a

demo0b

demo0c

models = ql.sample_models(feature_names, "y", query_string="func(gaussian('a', 'b'), _)")
for m in models[:3]:
    feyn.show_model(m)

demo1a

demo1b

demo1c

Using these queries in QLattice.auto_run would also let you find out which of these query-matching models perform best on some data set.

Complete feature list for the query language

The intent is to keep writing a query string as simple and as close to writing a mathematical function as possible. To this end, we provide the following abilities:

Input features are given as escaped strings ie 'a' or "a".
Standard addition and multiplication operators + and * are supported; they obey operator precedence, commutativity and associativity.
You can be specific with named functions. If you want a logarithm of 'a', just type log('a'). A gaussian can be both uni- and bivariate, in the query language their arity is specified by the number of arguments. Writing gaussian('a') matches a univariate gaussian of the input 'a' and writing gaussian('a', 'b') matches a bivariate gaussian of 'a' and 'b'. The complete list of named functions is:
1. log
2. exp
3. sqrt
4. squared
5. inverse
6. linear
7. tanh
8. gaussian
9. add
10. multiply
A function wildcard for any function, where the arity of the function is deduced by the number of arguments given. func('a') for any unary function of 'a' and func('a', 'b') for any binary function of 'a' and 'b'.
An input feature wildcard ? that will match any input feature.
Feature exclusion ! to explictly avoid some input features. !'a' will match any input feature except 'a'.
A subtree wildcard _ that will match any subtree. This can be extended as in _['a', 2] to further constrain the structure of the subgraph such that it must contain the feature 'a' and have at most a complexity of 2. It is also possible to exclude features using !. Arguments in the square brackets are separated by commas.

Comprehensive examples for all listed features

Working through examples is the best way to learn, so here are some for every point on the feature list.

Single feature queries

A very basic model that can tell you only the linear correlation of a single feature with the output.

models = ql.sample_models(feature_names, "y", query_string="'a'")
models[0]

Additions of specific features

This matches models where 'a' and 'b' are added.

models = ql.sample_models(feature_names, "y", query_string="'a' + 'b'")
models[0]

Multiplication of a feature, with an addition of another

This matches models where 'a' is added to a multiplication of 'b' and any other feature. Note how operator precedence is satisified, with the multiplication occurring earlier in the model than the addition.

models = ql.sample_models(feature_names, "y", query_string="'a' + 'b' * ?")
models[0]

Addition of two features, multiplied by any other feature.

Matches models where 'a' is added to 'b', then multiplied by any other feature. This time we explicitly perform the addition first.

models = ql.sample_models(feature_names, "y", query_string="('a' + 'b') * ?")
models[0]

A model with a squared function

Match models that take the square of 'a'.

models = ql.sample_models(feature_names, "y", query_string="squared('a')")
models[0]

Univariate Gaussian

Match models that take the univariate gaussian of 'a'.

models = ql.sample_models(feature_names, "y", query_string="gaussian('a')")
models[0]

Bivariate Gaussian

Match models that take the bivariate gaussian of 'a' and 'b'.

models = ql.sample_models(feature_names, "y", query_string="gaussian('a', 'b')")
models[0]

Binary interaction with nested addition

Matches models where there is any binary interaction of "a" with the addition of "b" and "c".

models = ql.sample_models(feature_names, "y", query_string="func('a', 'b' + 'c')")
models[0]

Unary functions of a feature

Matches models of any unary function of 'a'.

models = ql.sample_models(feature_names, "y", query_string="func('a')")
models[0]

Binary functions of a named feature and any other

Matches models that combine 'a' with any other input feature.

models = ql.sample_models(feature_names, "y", query_string="func('a', ?)")
models[0]

Binary functions with a named feature, excluding a specific other feature

Matches models that combine 'a' with any other input feature except 'b'.

models = ql.sample_models(feature_names, "y", query_string="func('a', !'b')")
models[0]

Models with a feature and any subtree of interactions

Match models that add 'a' to any subtree.

models = ql.sample_models(feature_names, "y", query_string="'a' + _")
models[0]

Models with a subtree excluding specific feature

Match models that add 'a' to any subtree that does not contain the input feature 'b'

models = ql.sample_models(feature_names, "y", query_string="'a' + _[!'b']")
models[0]

Subtrees of certain complexity and content

Match models that multiply 'a' to any subtree that contains the input feature 'c' and is at most 3 edges complex.

models = ql.sample_models(feature_names, "y", query_string="'a' * _['c', 3]")
models[0]

From model to query string

Given a Model,

models = ql.sample_models(feature_names, "y")
models[0].show()

it is possible to get its query_string by calling feyn.Model.to_query_string:

print(models[0].to_query_string())

'gaussian("c", "b")'