Frequently Asked Questions
by: Kevin Broløs and Valdemar Stentoft-Hansen
Support & Sign-up
Where can I get support?
We're always available on our Discord Server! Alternatively, you can shoot us an email.
How do I sign up?
How long until I get approved for a QLattice?
We approve people for QLattices
as an ongoing thing. We prioritize people who work with exciting projects they're willing to tell us about, so make sure to reach out to a friendly Abzoid if you think that's you!
How do I get my URL/Token
Do you support using the QLattice behing a proxy or firewall?
Yes! The QLattice
uses default proxy settings and should just work if your network is set up correctly. See more here.
I'm getting an exception I don't know what means!
We're happy to help! Hit us up on discord or shoot off an email! Take a look at this diagnostics tool and see if you can reproduce the error so we have a little information to go by to help you with.
Some common issues:
- Wrong semantic types assigned
- Numeric names for feature columns
- NaN or infinite values in either the input or output features
Also ensure that:
- Your API token and url are correctly defined and used
- You're connected to the internet to communicate with the
QLattice
(and your network is set up correctly) - You have enough memory available for your dataset, and you don't copy it unnecessarily
Do you have a technological whitepaper?
Not yet! But we're patent pending and as soon as we get that approved, we'll be able to start publishing more details about what makes us tick. If you're still curious, we're happy to explain further on our discord with you if you have any specific questions!
Features and Data
Can I use numeric names for my input features?
Ehhh, kinda. We require them to be encoded as strings, so do an str()
on them first, and everything will work as expected.
If I have binary data points, should they be numerical or categorical?
If there is no apparent ordering of your binary variable - "1 being higher than 0" - then go with the categorical semantic type. Truth be told though, there is little difference in how you treat them due to how we do automatic scaling and learning of weights.
General Usage Questions
Can I do a regression with multiple targets?
Unfortunately, that is not a possibility in the current framework - we instead suggest you to do multiple models - one for each target variable.
Can I classify on multiple classes?
Kind of. For simple cases, you can train it as a regression problem and round the predictions. For more complicated cases, you'll have to train a classifier for each class, aggregate over the predictions and do a softmax or argmax over the results. You can either just select the class with the highest probability, or select a threshold and then have your model communicate if it’s uncertain (i.e. none of them are above a satisfactory probability score).
How do I get the loss of my graph?
You can get the loss of any fitted graph, by looking at the property graph.loss_value
. You can also recompute the loss of the graph using either a metric from the feyn.metrics library of functions, or on the function directly on the graph itself, such as:
graph.absolute_error(data)
graph.squared_error(data)
Can I use a different loss function?
As of right now you can choose the loss function applied in the fitting process with:
loss_function = feyn.losses.squared_error
you can choose among squared_error
, absolute_error
and binary_cross_entropy
.
Can I guide my fitting towards a different metric, such as the F1 score, recall, etc.?
For the actual fitting process you wont be able to apply for example recall directly, since we are using gradient descent and recall has no gradient. You have two other options though.
- You can apply sampling weights to your problem putting more weight on to specific instances (for the recall example putting more weight on "True" values would force the
QLattice
to find a model that to a larger extend solves for these cases). Sample weights can be applied as an input (array of weights) to the fit function. - You can iterate through your top graphs (the
QGraph
is a sorted list of graphs, so qgraph[0] is your best graph at all times w.r.t. to the loss function applied) and calculate your metric of choice and sort by that. The best graph on your sorted list you can push to theQLattice
with the update function and in that way skew theQLattice
towards graphs that "solves" your metric.
Can I stop fitting when my best loss difference from prior is below a threshold value?
You can find the loss_value
on the highest graph after fitting on the property
best = qgraph[0]
best.loss_value
We don't have an explicit stop-functionality, but what we recommend is to put in a break in the fitting loop assessing the loss_value vs. your threshold. Note that other metrics are attached to the graphs as well such as r2_score, rmse, AUC, etc.
When I run qgraph.best() how many graphs should I expect?
qgraph.best()
usually gives you at most three differentiated graphs. What you will experience when running a lot of fitting loops is that a lot of the top graphs will be variations of the same ideas. The qgraph.best()
function only considers graphs in different locations of the QLattice
, ensuring they have evolved independentaly and maximizes the potential for differentiated, good graphs.
How do I get the top five graphs?
You can either do a qgraph.head(5)
to get them sorted by loss, or run qgraph.sort(data, loss_function)
, choosing which dataset and metric to sort the graphs by. You can sort by the same, or a different dataset/loss function than you trained on. This can be useful for validation sets.
When I use plot_summary() on a graph, what does signal capture mean?
By default, this is the mutual information criterion value (https://en.wikipedia.org/wiki/Mutual_information) of that specific place in the graph. You can choose to display the pearson correlation instead with corr_func = "pearson"
. It is a way of capturing where in the graph the signal "comes from".
Can I change the max depth during the fitting loop?
You can assign a new max_depth and other filters to an existing QGraph
using qgraph.filter(../guides.)
, however you have to wait for the .fit()
call to complete before you can refilter and run a .fit()
again.
So yes, in a fitting loop that looks like this, you can change any filter dynamically however way you like:
ql = QLattice()
qgraph = ql.get_regressor(data, 'target')
for _ in range(42):
qgraph.fit(data)
ql.update(qgraph.best())
qgraph = qgraph.filter([feyn.filters.Contains('my_feature'), feyn.filters.MaxDepth(3)])
However, for most use cases, unless you know what you're trying to accomplish we recommend that you change your qgraph "outside" of your fitting loop, to make sure that you find a decent set of graphs before changing the incoming flow of graphs. Remember to ask questions to the QLattice
about your dataset, to have it generate valuable hypotheses.
Can I fix the parameters for the interactions in my graphs?
It's not currently possible to fix the parameters for nodes. They're exclusively learnt through fitting the dataset. The options we currently provide for designing the graphs are through our filtering (excluding/including specific cells, depths, edges, that sort of thing). Consider if your needs would be met by these options, but feel free to send us a suggestion if you have a specific use-case where this is important. We generally believe the QLattice should do most of the hard work on presenting the best solutions to the user, so we will mostly look towards more general solutions, but if there's a niche that would be better supported by more control, it's of course worth thinking more about.
What is the "linear" interaction?
The linear transformation is assigning a new weight and bias on to the incoming instances of the interaction in order to minimise the loss of the graph as a whole. This is done via backwards propagation as you would know it from neural networks - so there is no "local regression" taking place in the cell - rather it's a graph-wide optimisation.
When a categorical binary value is passed into a tanh, isn't that redundant?
Locally in the graph, yes for a 0/1 categorical input one-way into a tanh cell, is redundant as the split is "already made". However, consider a graph where the categorical input is used in both the tanh and another transformation. In such a case the tanh is not redundant. The tanh assigns new values to the split, so you could have that the input assigns weights of say 0.5 and 0.6 to the two binary values. Whereas the tanh would go and transform these weights into 0 and 1. This will open up the possibilities of other transformation downstream in your graph.
Does sympify() print the equation that gives the prediction?
Yes, and no. sympify()
converts the graph to a SymPy
expression, which represents the underlying mathematical expression of the graph. This SymPy
expression can then be printed as an equation, but you can also work directly with it. To supplement this, you can use Graph.fit()
to fine-tune a specific graph (i.e. letting the stochastic gradient get all the way to the bottom of the minima it is searching in). We do learning rate dampening to ensure that we find the minima. In this way, once you have found your preferred graph you can tweak the parameters of it for the optimal version of that graph.
How do I save my model once it has been trained?
You can find a guide on how to do that here!
How does the QLattice fare on many categorical and binary features?
“It depends”. It is not an issue even if all the variables are categorical, unless there’s a high uniqueness among values that could lead to overfitting through memorisation.
You should ask yourself some questions: Are they all mutually exclusive (like a one-hot encoding) or can multiple be set simultaneously? We use the categorical semantic type for categorical features, which allows us to fit models without one-hot encoding (and indeed one-hot encoding will hurt performance).
- If they actually all hold individual (and combined) signal, you might have to be creative about the way you train in order to capture what's most important. We often use techniques such as mutual information, or run the QLattice with a low depth to find the features that contain the most signal, and reduce the dimensionality of the dataset.
- If the features are one-hot encoded (mutually exclusive), we recommend undoing that and training it on the true categories instead, using the categorical semantic type. That should improve your performance a lot, and simplify your interpretation of the resulting graph.
When I train, I only get a few features in the graphs - how do I force it to use all of them?
The QGraph
exhausts all the best combinations of features, so no forcing necessary. If you feel that the problem would be solved with more features, you might be interested in increasing the max_depth to allow for more complex graphs. Alternatively, you can use our filter functionality (using contains), to try out different fixed combinations of features. You can do that using qgraph.filter and using Feyn.filters.Contains with the names of the features you want to fix. You might need to run more fit iterations to allow for the search space to get processed, but it might help you investigate relations, such as we describe in our general workflow.
I really want more features in my graphs - what do I do?
The soft cap is probably around the ten features - if you have a lot more you'll see an enormous graph where you lose track of what is going on. We search for the simplest possible explanations where the number of features are limited and somewhat comprehensible for the model builder. Having hundreds of signal carrying features will not be a good fit for the QLattice
, without some good data preparation and asking the right, isolated questions.
If possible, try fitting more variation into fewer features, although we know that is probably easier said than done. For example, when dealing with RNA sequences you can get a lot of information out of applying categorical "windows" of SNP's and aggregate statistics of the sequences. The benefit is that the graph will also answer a more concrete question you can use to pose a new question or hypotheses about how things relate to each other.
What happens when I reset my QLattice?
When you have a fresh QLattice
, its state is unaffected by any learnings. If you then update the QLattice
with the learnings of a graph, its state will represent a very slight affinity towards connections this graph represents.
If you reset the QLattice
, the QLattice
state is wiped down to be free of learnings, and ready to tackle a different problem. If, before you reset the QLattice
, you worry you might regret it, you can always take a snapshot first. Having a snapshot allows you to always restore to get the previous state back.
How do I backup and restore my QLattice?
On this page, you can see how to create snapshots of your QLattice
Once you've created a snapshot, you can later restore it to that point in time. When you create a snapshot, you also get to attach a little note on it.
If I run QLattice.reset(), does it delete my snapshots?
Nope! Your snapshots are there to stay until you remove them yourself. You can restore a snapshot as many times as you want, and reset the QLattice
in between.