What is the QLattice?
by: Kevin Broløs
(Feyn version 3.0 or newer)
Feyn and The QLattice
Feyn
is a Python module for supervised machine learning. It interfaces with the QLattice
, an algorithm for performing symbolic regression developed by Abzu
.
The QLattice
is partially inspired by Richard Feynman's path integral formulation. That's why we've named our Python
library Feyn
.
It works by finding the best mathematical models that explain your dataset. It does this by composing mathematical operators and functions around the inputs and your chosen output. It then applies training, repeated reinforcement and evolution to learn which explanations are the best fit for your dataset.
Where no significant improvement is made, there's a high priority given to simple solutions over complex solutions, making it more likely to find explanations that human beings can understand.
Overall, symbolic regression approaches tend to keep a high performance, while still maintaining generalisability, which separates it from other popular models such as random forests. Our own benchmarks show similar results.
How it all works
In simple terms, you can consider a QLattice
as a probability distribution where models are sampled from it.
Every time a batch of models have been trained, we go through a process of updating the distribution with the best structures and decaying underperformers so there's space for a new batch of more likely competitors.
Going through this process helps a QLattice
converge and shapes the distribution towards better solutions.
When using Feyn
, this process of sampling, training, filtering, and updating models have been made easy.
The process step by step
Using Feyn
, you can solve either classification or regression problems.
The basic process looks like this:
- Sample a batch of models from the
QLattice
(typically a few thousand). - Fit them all using a version of backpropagation, and evaluate them on some criteria (such as a variety of loss function and information criteria).
- Discard the worst models that don't improve through repeated training.
- Update the
QLattice
with the structures of the best models. - Start over from point 1, adding a new batch of models to evaluate and compete with the ones you kept from the previous loop.
Every step in this process happens locally on your machine.
Find what you're looking for
In addition, you can define criteria in Feyn
that these models must meet when sampling.
This gives a flexible workflow, where you can restrict your search to specific features you wish to learn about, specific functions to include, the allowed complexity of the models and other similar constraints.
Feyn
also includes a suite of different tools and plots to help understand and inspect the results you get.
Understanding the models
In Feyn
, the models are visualised as easy-to-read graphs that translates the mathematical equation to a step-by-step input-to-output transformation. More specifically, these graphs are unidirectional, acyclic graphs.
This allows you to understand the relationships between the features used in each model, and how they influence each other to arrive at the outcome.
Feyn
also contains a suite of plots and tools to help you dig deeper into the models you get. This gives options to compare models, understand the relationships better, but also evaluate the tradeoffs of describing a relationship with one feature over another, biases and support levels present in your model.
This makes the Feyn
and the QLattice
especially great for when you want insights and intend to investigate relationships between your features.
Which functions are available as interactions in a model?
The functions available to the QLattice
vary from elementary ones such as addition, multiplication and squaring, to more complex ones such as natural logarithm, exponential and tanh. When composed in a model, these are called interactions
.
Interactions
are the basic computation units of each model. They take in data, transform it and then emit it out to be used in the next interaction
.
Here is a list of the current possible interactions:
Name | Function |
---|---|
Addition | |
Multiply | |
Squared | |
Linear | |
Tanh | |
Single-legged Gaussian | |
Double-legged Gaussian | |
Exponential | |
Logarithmic | |
Inverse |
Why is the QLattice algorithm necessary?
The QLattice
is an algorithm developed by Abzu
to be used in supervised machine learning for symbolic regression.
The QLattice
is the workhorse of the equation, providing an evolutionary environment from which to sample models from.
It provides a selection of interactions
-- functions that transform the inputs to a new output -- and uses those in combination with your input features to learn which connections are the strongest through repeated reinforcement, sampling and evolution.
As you sample and fit models, the best solutions will be reinforced and variations will evolve. Alongside them, new potential solutions will be discovered over time. When you carry on this repeated reinforcement islands will form in the QLattice
environment, each with their own independent evolution.
This narrows the search space to converge to the best solution fast, as well as gives way to many separate evolutionary spaces.
A benefit to this process, is that the user helps decide which models are useful, and which paths will be reinforced. The user also decides how to constrain the decision space, giving the user full control over the shapes the models will be taking.
At the end of training, the user is presented with a selection of different possible explanations to their dataset, rather than a single ground truth. This allows the user to critically investigate novel relationships that help them understand their problem better and learn new dynamics that can be applied to solve it.
Altogether, this approach has some benefits, such as:
- There are far fewer nodes and connections.
- There are functions you wouldn't normally see in a neural network.
- The models are more inspectable, simpler, and less prone to overfitting.
- The models are mathematical formulas, allowing you to reason about the consequences of your hypothesis.
- The models that have been tried are diverse and you can trust that nothing has been overlooked during training.
If there's a signal, the QLattice
will find it - so you can trust whether your problem is best solved with a complex non-linear mathematical equation, or a simple linear model.
Why not just brute-force?
The space of all possible models is potentially infinite, which makes brute-forcing the solution intractable for all but the simplest datasets. In addition, the more dimensions you have in your dataset, the more likely spurious correlations are to occur.
This is why you update the QLattice
with the best model structures and use criterions during training to limit the complexity and affordance of bringing in new variables unless they have a statistically significant improvement.
You can also narrow the search space, by being specific on what relationships to investigate, and restrict the types of models the QLattice
will produce.
What about privacy?
Every step of the process when using a QLattice
with Feyn
happens locally on your machine. You can even run this without an internet connection.
In particular this means that none of your data is at any point exchanged and does not leave your machine.
A tiny bit of history
The QLattice
is a high-performance simulation algorithm written in C. It originally ran in Abzu
's data center for performance reasons. Feyn
was made as a high-level interface to keep a familiar local data science workflow, while offloading the sampling to the QLattice
running on a remote server.
The QLattice
was since optimized to run on modern consumer hardware and the need for running it in the cloud disappeared.
That's why the Feyn
Python library now comes bundled with the QLattice
algorithm, and why you see us talking about both Feyn
-- our Python library for conducting data science work -- and the QLattice
-- the algorithm we invented to find the best symbolic models for a given dataset.
Back to the future
Since the release of Feyn
3.0, everything runs entirely locally on the user's machine, including the optimised version of the QLattice
.
The old versions of Feyn
before 3.0 are no longer maintained and there are no remote servers to connect to anymore.