We’ve been thinking about how to parameterise/train/calibrate models, and also how to select an appropriate model to begin with. This raises all sorts of interesting questions on the details, but this post is just about setting out a big overview of how this sort of thing might work, in an ideal scenario.
We’ll go through the concepts with the help of the diagram I’ve drawn in Figure 1. We initially sketched this out for Kylie‘s thesis, she’s the one doing all the work on this for the hERG channel models at the moment, but it didn’t make the final cut there, so here it is instead.
First off, if you have an overly simple model, sitting at the top of Figure 1, you probably won’t have much ambiguity in the parameter values it should take for the best fit to training data, but the fit to training data isn’t likely to be very good, and the prediction of new cases in validation is likely to be worse.
But, it is always possible to get a better fit to training data by increasing the complexity of your model (adding free parameters, extra equations etc.):
“With four parameters I can fit an elephant, and with five I can make him wiggle his trunk”
John von Neumann
In statistical models/machine learning approaches this is called “overfitting”, but the same applies to adding lots of complexity (postulated reactions or ion channel states etc.) into biophysical models. For some examples of overfitting see a google image search on ‘overfitting’ to give you a good idea how it happens, and why it might lead to really bad predictions later on.
So you can make your model overly-complicated, but this is subtle: there’s the possibility that if you try and fit too many parameters to a dataset that isn’t “information rich” enough, you’ll also run into problems. I’ve shown an example of this in Figure 3, now we’re at the bottom of Figure 1, the model is too complex for the training data to constrain in panel A, and so gives rubbish predictions in panel B (and a whole range of predictions depending at random on our initial guess at the parameters! This is a bad thing). But if we change the training data to a more ‘information-rich’ protocol that allows us to identify the parameters better, then we can make good predictions and we are back in the ‘optimal’ setting. So “Model Complexity” really means ‘model complexity relative to a given training protocol’.
So that was an example of a model with too many free parameters to constrain in a certain situation. It is quite possible, and easy, to write down a model that has too many free parameters to constrain with any of your available data. Or you could even write down a whole range of models of high complexity that fit the data perfectly, but will lead to different predictions. What should you do in this case? I am still a fan of simplifying the model down to one with identifiable parameters*, and testing out the simplest model that’s a decent fit with identifiable parameters. I make that sound easy, but in general it isn’t.
So which model should you use? Well, things like Information Criteria will try and let you guess from just the training data fit and model complexity. But you’ve no guarantee that they are going to accurately predict the profile of your validation error on the right of Figure 4, although that is what they’re effectively trying to do. So it is absolutely crucial to do the last step of making some validation predictions.
Making validation predictions sounds simple enough, but opens up more questions: how do I choose which validation experiment to perform? If I choose one that is too simple/similar to the training/calibration experiment, then I could trick myself into thinking I’m doing really well (and kid myself that the green shape on the right is really the same shape as the red training triangle). Most of the literature will tell you to do something based on the ‘context of use’, that is, situations as similar as possible to where you want to use the model ‘for real’. e.g. if I want to use a model to predict cardiac effects at 1Hz, then test it out at just 1Hz. This makes perfect sense for statistical/machine learning models.
With biophysical models, I don’t think it is that simple, and as usual it probably depends on what you are making the model for. If you want to predict some behaviour in future, then yes, do the ‘context of use’ simulations to see how well you are likely to do. But you might want to know whether your biophysical model is a good description of the biophysics – perhaps the biophysics itself is what you are trying to learn about, or perhaps you can’t be sure what the ‘context of use’ is (voltage waveforms in arrhythmias?). The biophysics should hold – not only outside the training data regime – in any regime where the underlying assumptions you built in about physical processes hold. In this case, perhaps it makes sense to throw the wackiest validation protocol you can think of at the model, to see if it really does capture the underlying biophysics? So we’re trying that, and I’ll let you know how we get on!
You also have to avoid falling into the trap of using validation predictions to perform the model selection for you. Sounds like an obvious thing to do, but then you have turned your ‘validation data’ into ‘model selection training data’! This is easy to do by accident, and really if you do this you should have a third set of data – sometimes called ‘test data’ to see how well the calibrated and selected model performs. Often machine learning people do work with three datasets like this – see Chapter 7 of Hastie et al.’s Elements of Statistical Learning book for a good intro.
*As an aside, people often talk about a model being “identifiable” (i.e. “the parameters of the model are identifiable”). But we have to remember that only makes any sense as shorthand for the statement “the parameters of the model are identifiable, given this training/calibration data“. Which immediately raises the question of “What if we calibrate the parameters to the results of a different experiment?”, as the example in Figure 3 shows, which is getting me into the whole new field of optimal experimental design…