# Using the query language

by: Jaan Kasak

(Feyn version 3.0 or newer)

In using the `QLattice`

to extract learnings from your data, the process works best when you have a specific question in mind. This question can be as as simple as "Which other feature does feature `a`

best interact with linearly?" or as complex and specific as "I know that when both features `a`

and `b`

have specific value, they contribute strongly to the output, but do there exist any other signal paths that interact with this specific joint state for `a`

and `b`

?".

The `QLattice`

will find all statistical correlations in your data, but when there is a cause/effect relationship between your target and feature variables is when asking questions using the `QLattice`

truly shines. Other ML methods rely on feature engineering and separating data sets to support this type of workflow, but since we provide mathematical equations, we can offer a neat **query language** to ask for models with some specific underlying equation.

Queries are written in the `query_string`

fields on the methods `QLattice.sample_models`

and `QLattice.auto_run`

. As a spoiler, the two questions above translate to these queries.

## Example

```
import feyn
ql = feyn.QLattice()
feature_names = ["a", "b", "c", "d", "e"]
models = ql.sample_models(feature_names, "y", query_string="'a' + ?")
for m in models[:3]:
feyn.show_model(m)
```

```
models = ql.sample_models(feature_names, "y", query_string="func(gaussian('a', 'b'), _)")
for m in models[:3]:
feyn.show_model(m)
```

Using these queries in `QLattice.auto_run`

would also let you find out which of these query-matching models perform best on some data set.

## Complete feature list for the query language

The intent is to keep writing a query string as simple and as close to writing a mathematical function as possible. To this end, we provide the following abilities:

- Input features are given as escaped strings ie
`'a'`

or`"a"`

. - Standard addition and multiplication operators
`+`

and`*`

are supported; they obey operator precedence, commutativity and associativity. - You can be specific with named functions. If you want a logarithm of
`'a'`

, just type`log('a')`

. A gaussian can be both uni- and bivariate, in the query language their arity is specified by the number of arguments. Writing`gaussian('a')`

matches a univariate gaussian of the input`'a'`

and writing`gaussian('a', 'b')`

matches a bivariate gaussian of`'a'`

and`'b'`

. The complete list of named functions is:`log`

`exp`

`sqrt`

`squared`

`inverse`

`linear`

`tanh`

`gaussian`

`add`

`multiply`

- A function wildcard for any function, where the arity of the function is deduced by the number of arguments given.
`func('a')`

for any unary function of`'a'`

and`func('a', 'b')`

for any binary function of`'a'`

and`'b'`

. - An input feature wildcard
`?`

that will match any input feature. - Feature exclusion
`!`

to explictly avoid some input features.`!'a'`

will match any input feature except`'a'`

. - A subtree wildcard
`_`

that will match any subtree. This can be extended as in`_['a', 2]`

to further constrain the structure of the subgraph such that it must contain the feature`'a'`

and have at most a complexity of 2. It is also possible to exclude features using`!`

. Arguments in the square brackets are separated by commas.

## Comprehensive examples for all listed features

Working through examples is the best way to learn, so here are some for every point on the feature list.

### Single feature queries

A very basic model that can tell you only the linear correlation of a single feature with the output.

```
models = ql.sample_models(feature_names, "y", query_string="'a'")
models[0]
```

### Additions of specific features

This matches models where `'a'`

and `'b'`

are added.

```
models = ql.sample_models(feature_names, "y", query_string="'a' + 'b'")
models[0]
```

### Multiplication of a feature, with an addition of another

This matches models where `'a'`

is added to a multiplication of `'b'`

and any other feature. Note how operator precedence is satisified, with the multiplication occurring earlier in the model than the addition.

```
models = ql.sample_models(feature_names, "y", query_string="'a' + 'b' * ?")
models[0]
```

### Addition of two features, multiplied by any other feature.

Matches models where `'a'`

is added to `'b'`

, then multiplied by any other feature. This time we explicitly perform the addition first.

```
models = ql.sample_models(feature_names, "y", query_string="('a' + 'b') * ?")
models[0]
```

### A model with a squared function

Match models that take the square of `'a'`

.

```
models = ql.sample_models(feature_names, "y", query_string="squared('a')")
models[0]
```

### Univariate Gaussian

Match models that take the univariate gaussian of `'a'`

.

```
models = ql.sample_models(feature_names, "y", query_string="gaussian('a')")
models[0]
```

### Bivariate Gaussian

Match models that take the bivariate gaussian of `'a'`

and `'b'`

.

```
models = ql.sample_models(feature_names, "y", query_string="gaussian('a', 'b')")
models[0]
```

### Binary interaction with nested addition

Matches models where there is any binary interaction of "a" with the addition of "b" and "c".

```
models = ql.sample_models(feature_names, "y", query_string="func('a', 'b' + 'c')")
models[0]
```

### Unary functions of a feature

Matches models of any unary function of `'a'`

.

```
models = ql.sample_models(feature_names, "y", query_string="func('a')")
models[0]
```

### Binary functions of a named feature and any other

Matches models that combine `'a'`

with any other input feature.

```
models = ql.sample_models(feature_names, "y", query_string="func('a', ?)")
models[0]
```

### Binary functions with a named feature, excluding a specific other feature

Matches models that combine `'a'`

with any other input feature except `'b'`

.

```
models = ql.sample_models(feature_names, "y", query_string="func('a', !'b')")
models[0]
```

### Models with a feature and any subtree of interactions

Match models that add `'a'`

to any subtree.

```
models = ql.sample_models(feature_names, "y", query_string="'a' + _")
models[0]
```

### Models with a subtree excluding specific feature

Match models that add `'a'`

to any subtree that does not contain the input feature `'b'`

```
models = ql.sample_models(feature_names, "y", query_string="'a' + _[!'b']")
models[0]
```

### Subtrees of certain complexity and content

Match models that multiply `'a'`

to any subtree that contains the input feature `'c'`

and is at most 3 edges complex.

```
models = ql.sample_models(feature_names, "y", query_string="'a' * _['c', 3]")
models[0]
```

## From model to query string

Given a `Model`

,

```
models = ql.sample_models(feature_names, "y")
models[0].show()
```

it is possible to get its `query_string`

by calling `feyn.Model.to_query_string`

:

```
print(models[0].to_query_string())
```

```
'gaussian("c", "b")'
```