Using the query language
by: Jaan Kasak
(Feyn version 3.0 or newer)
In using the QLattice
to extract learnings from your data, the process works best when you have a specific question in mind. This question can be as as simple as "Which other feature does feature a
best interact with linearly?" or as complex and specific as "I know that when both features a
and b
have specific value, they contribute strongly to the output, but do there exist any other signal paths that interact with this specific joint state for a
and b
?".
The QLattice
will find all statistical correlations in your data, but when there is a cause/effect relationship between your target and feature variables is when asking questions using the QLattice
truly shines. Other ML methods rely on feature engineering and separating data sets to support this type of workflow, but since we provide mathematical equations, we can offer a neat query language to ask for models with some specific underlying equation.
Queries are written in the query_string
fields on the methods QLattice.sample_models
and QLattice.auto_run
. As a spoiler, the two questions above translate to these queries.
Example
import feyn
ql = feyn.QLattice()
feature_names = ["a", "b", "c", "d", "e"]
models = ql.sample_models(feature_names, "y", query_string="'a' + ?")
for m in models[:3]:
feyn.show_model(m)
models = ql.sample_models(feature_names, "y", query_string="func(gaussian('a', 'b'), _)")
for m in models[:3]:
feyn.show_model(m)
Using these queries in QLattice.auto_run
would also let you find out which of these query-matching models perform best on some data set.
Complete feature list for the query language
The intent is to keep writing a query string as simple and as close to writing a mathematical function as possible. To this end, we provide the following abilities:
- Input features are given as escaped strings ie
'a'
or"a"
. - Standard addition and multiplication operators
+
and*
are supported; they obey operator precedence, commutativity and associativity. - You can be specific with named functions. If you want a logarithm of
'a'
, just typelog('a')
. A gaussian can be both uni- and bivariate, in the query language their arity is specified by the number of arguments. Writinggaussian('a')
matches a univariate gaussian of the input'a'
and writinggaussian('a', 'b')
matches a bivariate gaussian of'a'
and'b'
. The complete list of named functions is:log
exp
sqrt
squared
inverse
linear
tanh
gaussian
add
multiply
- A function wildcard for any function, where the arity of the function is deduced by the number of arguments given.
func('a')
for any unary function of'a'
andfunc('a', 'b')
for any binary function of'a'
and'b'
. - An input feature wildcard
?
that will match any input feature. - Feature exclusion
!
to explictly avoid some input features.!'a'
will match any input feature except'a'
. - A subtree wildcard
_
that will match any subtree. This can be extended as in_['a', 2]
to further constrain the structure of the subgraph such that it must contain the feature'a'
and have at most a complexity of 2. It is also possible to exclude features using!
. Arguments in the square brackets are separated by commas.
Comprehensive examples for all listed features
Working through examples is the best way to learn, so here are some for every point on the feature list.
Single feature queries
A very basic model that can tell you only the linear correlation of a single feature with the output.
models = ql.sample_models(feature_names, "y", query_string="'a'")
models[0]
Additions of specific features
This matches models where 'a'
and 'b'
are added.
models = ql.sample_models(feature_names, "y", query_string="'a' + 'b'")
models[0]
Multiplication of a feature, with an addition of another
This matches models where 'a'
is added to a multiplication of 'b'
and any other feature. Note how operator precedence is satisified, with the multiplication occurring earlier in the model than the addition.
models = ql.sample_models(feature_names, "y", query_string="'a' + 'b' * ?")
models[0]
Addition of two features, multiplied by any other feature.
Matches models where 'a'
is added to 'b'
, then multiplied by any other feature. This time we explicitly perform the addition first.
models = ql.sample_models(feature_names, "y", query_string="('a' + 'b') * ?")
models[0]
A model with a squared function
Match models that take the square of 'a'
.
models = ql.sample_models(feature_names, "y", query_string="squared('a')")
models[0]
Univariate Gaussian
Match models that take the univariate gaussian of 'a'
.
models = ql.sample_models(feature_names, "y", query_string="gaussian('a')")
models[0]
Bivariate Gaussian
Match models that take the bivariate gaussian of 'a'
and 'b'
.
models = ql.sample_models(feature_names, "y", query_string="gaussian('a', 'b')")
models[0]
Binary interaction with nested addition
Matches models where there is any binary interaction of "a" with the addition of "b" and "c".
models = ql.sample_models(feature_names, "y", query_string="func('a', 'b' + 'c')")
models[0]
Unary functions of a feature
Matches models of any unary function of 'a'
.
models = ql.sample_models(feature_names, "y", query_string="func('a')")
models[0]
Binary functions of a named feature and any other
Matches models that combine 'a'
with any other input feature.
models = ql.sample_models(feature_names, "y", query_string="func('a', ?)")
models[0]
Binary functions with a named feature, excluding a specific other feature
Matches models that combine 'a'
with any other input feature except 'b'
.
models = ql.sample_models(feature_names, "y", query_string="func('a', !'b')")
models[0]
Models with a feature and any subtree of interactions
Match models that add 'a'
to any subtree.
models = ql.sample_models(feature_names, "y", query_string="'a' + _")
models[0]
Models with a subtree excluding specific feature
Match models that add 'a'
to any subtree that does not contain the input feature 'b'
models = ql.sample_models(feature_names, "y", query_string="'a' + _[!'b']")
models[0]
Subtrees of certain complexity and content
Match models that multiply 'a'
to any subtree that contains the input feature 'c'
and is at most 3 edges complex.
models = ql.sample_models(feature_names, "y", query_string="'a' * _['c', 3]")
models[0]
From model to query string
Given a Model
,
models = ql.sample_models(feature_names, "y")
models[0].show()
it is possible to get its query_string
by calling feyn.Model.to_query_string
:
print(models[0].to_query_string())
'gaussian("c", "b")'