QLattice is a supervised machine learning tool for symbolic regression, and is a technology developed by
Abzu that is inspired by Richard Feynman's path integral formulation. That's why we've named our
Feyn, and the
QLattice is for
It composes functions together to build mathematical models betweeen the inputs and output in your dataset. The functions vary from elementary ones such as addition, multiplication and squaring, to more complex ones such as natural logarithm, exponential and tanh.
Overall, symbolic regression approaches tend to keep a high performance, while still maintaining generalisability, which separates it from other popular models such as random forests. Our own benchmarks show similar results.
Feyn is the python module for interfacing with the
QLattice, and for training models on your local machine.
When sampling models from the
QLattice, you define criteria in
Feyn that these models must meet - such as: Is it a classification or regression problem, which features you want to learn about, what functions you want to include, how complex the models may be, and other such constraints.
The fitting process
A typical process looks like this:
- You sample a few thousand models at a time.
- You fit them all using a version of backpropagation, and evaluate them on some criteria (such as a variety of loss function and information criteria).
- You discard the worst models based on a handful of options such as dropout and decay - or even your own.
- You update the
QLatticewith the structures of the best models - typically the top ten that are dissimilar enough to ensure a good learning.
- You start over from point 1, and add a new handful of samples to your list of models to evaluate and compete with the ones you kept from the previous loop.
Each model is sampled from the
QLattice, which you can consider as a probability distribution, that is tuned over time. Initially, this distribution is uniform. Going through this process helps the
QLattice converge and shapes the distribution towards better solutions.
The models are fitted locally on your machine using
Feyn, a Python module for interfacing with the
Why not just brute-force?
The space of all possible models is potentially infinite, which makes brute-forcing the solution intractable for all but the simplest datasets. This is why you update the
QLattice with the best model structures so far. You can also narrow the search space, by being specific on what relationships to investigate, and restrict the types of models the
QLattice will produce.
What about privacy?
The information sent to the
QLattice only consists of:
- The token and
QLatticeidentifiers when you connect.
- The names of your columns (text strings)
- The model structures you want to reinforce, containing column names, function names (interactions), along with the unweighted edges in the graph that connect inputs to functions, functions to functions and functions to outputs.
None of your data is at any point exchanged and does not leave your machine.
Understanding the models
The resulting models are represented by unidirectional, acyclic graphs that cleanly visualize what happens in the mathematical equation for everyone to understand. On top of this, we have a suite of plots and tools to help you dig deeper into the models you get, and help you understand not only the relationships better, but also the tradeoffs, biases and support levels present in your model.
This makes the
QLattice especially great for when you want insights and intend to investigate relationships between your features.
The QLattice in a nutshell
QLattice is an environment to simulate discrete paths from multiple inputs to an output. It does this in a finite multi-dimensional lattice-space. This is where the inspiration from Feynman's path integral comes in.
QLattice simulates inputs taking a path through the lattice space before emerging to an output. If you do this until a solid path has been shaped, you'll eventually converge to the path most likely to explain the problem you're trying to model. Along the path that we take, we'll randomly sample from a selection of
interactions -- functions that transform the inputs to a new output.
Interactions are the basic computation units of each model. They take in data, transform it and then emit it out to be used in the next
interaction. Here are the current possible interactions:
We determine the
interactions based on probabilities, guided by repeated reinforcement of the best solutions provided by the
QLattice, as you fit the hundreds of thousands of models, that are discovered. During repeated reinforcement, islands will form in the
QLattice space, each with their own independent evolution. This narrows the search space, and gives way to many separate evolutionary spaces. A benefit to this process, is that the user helps decide which models are useful, and which paths will be reinforced. The user also decides how to constrain the decision space, giving the user full control over the shapes the models will be taking.
Altogether, this approach has some benefits, such as:
- there are far fewer nodes and connections.
- there are functions you wouldn't normally see in a neural network.
- the models are more inspectable, simpler and less prone to overfitting.
- the models are mathematical formulas, allowing you to reason about the consequences of your hypothesis.
- the models that have been tried are diverse and you can trust that nothing has been overlooked during training.
If there's a signal, the
QLattice will find it - so you can trust whether your problem is best solved with a complex non-linear mathematical equation, or a simple linear model.