A vision of the future

Jonathan Cooper and I have been working on a system for describing the simulation you would like to perform with a model*, in a way that is flexible enough to recreate the majority of possible experiments and post-processing, and machine-readable so it can be automatically applied to any relevant model. If that sounds a bit dry and computer sciencey, well it is sometimes! But I’m quite excited about it, this post is about why.

We’ve called it Functional Curation for now, but simply ‘curating the functions of models’, or, to put it another way,  ‘seeing what the models do in different situations’ (see the pilot paper on it) is just the first of many possibilities that the work opens up.

I’ve come up with the following vision of the future to illustrate what I mean:

Alice sat down at her desk and glanced at the computer clock, 9.00am Thursday 17th April 2025. Her task for the day was to make a new <pick an ion channel> model, because a novel phosphorylation state for this ion channel was discovered on Tuesday that forms in <pick a very specific cell type and disease circumstances>.

On Wednesday morning, the experimental team had looked up the existing mathematical model of <ion channel> for healthy <very specific cell type> and had access to all the protocols that were needed to fully describe the kinetics of this channel, at a range of temperatures, pHs, ion concentrations etc. They had downloaded the protocols, and run their new wet-lab experiments using the PatchClamper3000+. This time they had added a new protocol that varied the phosphorylation state as well. The results had been uploaded to the World Health Organisation central database on Wednesday evening, along with a computer-readable description of the corresponding protocols, experimental conditions and meta-data.

Alice had downloaded all this by 9.10am on Thursday, and was immediately able to simulate the predictions of the existing <ion channel> model using the machine readable protocols. The system alerted her to a difference between the existing model’s predictions and the new recordings: she had a quick glance, this novel phosphorylation state made quite a difference.

Together with SuperFancyFitting algorithms and full access to the raw data and protocols she selected a range of sensible objective functions, and re-fitted the previous model’s parameter co-variance distributions to describe the new phosphorylation state; unfortunately it couldn’t describe the data whatever re-fitting she tried.

Luckily NovelSuperFancyFitting algorithms had been invented in 2020 that suggested a range of possible new equations for the kinetic states and transitions. She was able to select the minimal model for a great fit, and had evaluated its predictive power by 11am.

The scientific community had standard guidelines for the adoption of novel models, and she had run the automated checks by 11.30, it passed and was of deemed of sufficient predictive value to be widely useful. By lunchtime, the Virtual Physiological Human model (in use in every hospital in the world) had its <ion channel> module upgraded. The model was providing clinicians looking after <specific disease> patients with far more accurate predictions for effective pain relief, exercise, diet supplements, etc., and also showed them the risk of side effects with different possible treatments more accurately.

It’s perhaps a bit Utopian! But it is worth thinking about what is missing today: what’s actually stopping us doing this now? I would argue not that much, almost all the technology we need is there (apart from the VPH model itself of course! But that might exist if models were made in this way…?). SuperFancyFitting algorithms do exist, although we might not generally use them, but that’s something for another blog post! I’m fairly sure NovelSuperFancyFitting algorithms don’t yet, but educated guesses could be made for new ion channel structures, so perhaps this step is slowing us down by about a fortnight at present. Similarly the PatchClamper3000+ doesn’t exist either, so the experiments might take a month or two at present. So why does it feel as if we’re still a lot further away from being able to develop models like this?

I think what’s really missing today is the concrete link between training & validation data and mathematical models. And the main thing that is holding us up there, is not lack of space to store experimental results, database technologies to do so, or algorithms that could do it, it is simply that we haven’t got a way of describing experimental protocols. If we had, experiments could be automatically replicated virtually using any mathematical model, and therefore predictions could be automatically compared with experimental data. I’ve highlighted the bits of the story that rely on this in purple. These bits were crucial for getting the right experiments performed, and allowing the modeller to use the resulting data efficiently.

How hard will it be to get ourselves something for doing this? There’s a lot to do, but hopefully what we’ve called ‘Functional Curation’ already goes some way to addressing it. For more info see Jonathan’s Call for Virtual Experiments pre-print: it features a version of my story above. And look out for another blog post on our prototype web portal for Functional Curation with cardiac action potential models soon.

Comments welcome!

*SED-ML (Simulation Experiment Description Markup Language) does some of this already – Jonathan’s an editor of SED-ML too. But we needed to make a few additions, that we’re hoping to get into future versions of SED-ML. In particular the bits that allow you to: apply the same protocol to different models through the use of standardised annotations; perform more advanced post-processing; and nest entire protocols inside other ones.
This entry was posted in Future developments, Virtual Experiments and tagged , , , , , , , . Bookmark the permalink.

One Response to A vision of the future

  1. Hitesh says:

    Hi Gary, interesting topic. I’m curious by the NovelSuperFancyFitting algorithm, the one that is likely to select the best model structure such that a) explains the past and b) is predictive of future experiments. I see a few problems here, the first being that even today there is no model selection method that says that structure A will perform better than structure B at predicting the future (out-of-sample outcomes). Information criteria such as AIC and BIC which you would most likely use do not correlate with how well a model is likely to perform to an out of sample problem. I guess what i am assuming here is that the validation data-set should be a new data-set every time you tweak the model. I personally don’t think its valid to use data that has been used in past validation exercises in new validation exercises and in fact from my own experience many experimental scientists would probably go along with this, i think? If you do are you not in danger of over-fitting? Again this is difficult to prevent even with today’s statistics. You are also assuming there is only one model with which you start the process with which suggests (to me) that biology is an exact science which it’s not is it? I think this may interest you: http://arxiv.org/pdf/1101.0891.pdf not read it cover to cover as i never do with any article but it may well be useful.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s