[Vp-integration-subgroup] Another example of integration and reproduction of a model

Jacob Barhak jacob.barhak at gmail.com
Mon Jan 11 04:32:09 PST 2021


Thanks Will, Thanks Robin,

This communication between us took a few weeks and has now become public in
the integration subgroup and in the Model Reproducibility, Credibility &
Standardization subgroup.

For the sake of new readers, it shows the difficulties of integrating a
model into another model, even after publication. The interested readers
are welcome to read the correspondence in chronological order - meaning
starting from the end of this forwarded email  and going up to see
obstacles in integrating a model even after publication.

For those interested only is a summary, here are highlights of the effort.
I was looking for an infectiousness model to plug into my ensemble model.
Will and Robin created such a model and published it as a pre-print:
  https://www.medrxiv.org/content/10.1101/2020.11.20.20235754v1
and made code and data available:
 https://github.com/will-s-hart/COVID-19-Infectiousness-Profile

Despite the good work, there are still difficulties in integration that I
summarize below:

1. Licensing / legal issues - even with the good will and permission of the
authors to reuse results, it was still unclear how the code is allowed to
be reused. This problem is typical in many situations. Each jurisdiction
has its own regulations on how to release results and under what conditions
and in many cases. Release conditions are unclear even to the creators of
the work who wish to make the results available for reuse. This problem is
deeper than suspected and this small example is far from representative and
should be discussed in the working group.

2. Code transfer - Even when code is available with results in matlab, it
was not easy to transfer the data/code to python. The reimplementation I
chose eventually was hand digitizing the plots by hand rather than using
the function or numerical form already computed - mostly to preserve time
in attempts to convert data from a system I do not own. This is a good
point for discussion since there are many systems constructed to help reuse
and it would be helpful if researchers are familiar with them.

3. Adaptation of scale. The authors published the infectiousness model as a
density function. I needed to transfer it to a curve that shows a value of
1 at max infectiousness to integrate it with multiple other infectiousness
models I was integrating into the ensemble. Such typical adjustments of
scale are regular in any integration adh should not be forgotten. Therefore
it is important that model units and outputs are made clear.In this
situation this was easy, yet it is an important element that researchers
should be attentive to  when publishing and integrating models.


This model was successfully integrated into the ensemble and initial
simulations show the model has influence on the ensemble. There are more
simulations needed to confirm this and work is on the way towards
publication, yet for the sake of discussion it was important to show this
effort to the working group in the most authentic way possible. I thank
Will and Robin for allowing the release of our private communications.

This conversation adds information to another example with
Filippo Castiglione summarized here:
https://lists.simtk.org/pipermail/vp-integration-subgroup/2021-January/000011.html

I call upon other members of the working group to continue the discussion
and raise other issues they had with reproducing models or integrating
them. If we collect enough examples, we will have a good picture of
difficulties and perhaps make some recommendations that can help us all in
the future.

             Jacob


On Mon, Jan 11, 2021 at 4:39 AM William Hart <william.hart at keble.ox.ac.uk>
wrote:

> Hi both,
>
> This is fine with me.
>
> Best,
> Will
> ------------------------------
> *From:* Robin Thompson <robin.thompson1988 at gmail.com>
> *Sent:* 11 January 2021 10:03
> *To:* Jacob Barhak <jacob.barhak at gmail.com>
> *Cc:* William Hart <william.hart at keble.ox.ac.uk>
> *Subject:*
>
> Hi Jacob,
>
> If it is ok with Will, please feel free to post the text below to the
> mailing list, if you think this is useful (including this email).
>
> I guess it might be worth highlighting in addition the fact that we were
> unsure, from the university’s perspective, whether there would be any
> issues regarding commercial use of IP.  None of the coauthors of the
> preprint had any *personal* issues with the results being used (instead we
> were grateful that someone else was interested in our research!)
>
> As you mentioned, please also feel free to summarise any other things we
> discussed in a summary paragraph, if this is helpful.
>
> Thanks, and best wishes,
> Robin
> ----------------------------------------------------------------------
> Dr Robin Thompson
> Assistant Professor of Mathematical Epidemiology
> Mathematics Institute
> University of Warwick, UK
> www.robin-thompson.co.uk
> ----------------------------------------------------------------------
>
> On 21 Dec 2020, at 00:45, Jacob Barhak <jacob.barhak at gmail.com> wrote:
>
> Hi Will, Hi Robin,
>
> Would you be ok if this entire private conversation we have becomes public
> on the integration working group mailing list?
>
> I am trying to start activities there and it seems some of the discussion
> here are relevant to the group activities. I already posted one, and hope
> to get a few more going to be posted here:
> https://lists.simtk.org/pipermail/vp-integration-subgroup/
>
> I started looking at your code and saw that you extract the data for the
> plots from the results that are saved in matlab format.
> I do not have matlab so I tried to load your data into python and it seems
> I cannot load it. Here is what I did within python:
>
> >>> import scipy.io
> >>> a=scipy.io.loadmat('gen_tost_serial_varinf.mat')
> >>> a
> {'__function_workspace__': array([[ 0,  1, 73, ...,  0,  0,  0]],
> dtype=uint8), 'None': MatlabOpaque([('f_tost_varinf', 'MCOS', 'chebfun',
> array([[3707764736],
>        [         2],
>        [         1],
>        [         1],
>        [       351],
>        [         5]], dtype=uint32))],
>              dtype=[('s0', 'O'), ('s1', 'O'), ('s2', 'O'), ('arr', 'O')]),
> '__version__': '1.0', '__header__': 'MATLAB 5.0 MAT-file, Platform: MACI64,
> Created on: Wed Nov 25 15:32:08 2020', '__globals__': []}
> >>> b=a['__function_workspace__']
> >>> b.shape
> (1L, 328760L)
> >>> b[:,0:60]
> array([[  0,   1,  73,  77,   0,   0,   0,   0,  14,   0,   0,   0, 152,
>           3,   5,   0,   6,   0,   0,   0,   8,   0,   0,   0,   2,   0,
>           0,   0,   0,   0,   0,   0,   5,   0,   0,   0,   8,   0,   0,
>           0,   1,   0,   0,   0,   1,   0,   0,   0,   1,   0,   0,   0,
>           0,   0,   0,   0,   5,   0,   4,   0]], dtype=uint8)
>
> This data does not look the same shape and size that you are plotting, so
> I assume I cannot read it from python directly. To the record, I also tried
> importing with mat4py and with  h5py assuming you are using matlab of at
> least version 7.3. Yet all those attempts were unsuccessful so I am going
> to use other - more manual techniques.  Currently it seems that the fastest
> way for me to integrate your model into mine is by using hand digitization
> of your plot - not a very advanced way, yet more practical timewise.
>
> I am showing this example so people will be aware that when models are
> integrated they may arrive from different systems and bridging this gap is
> something the group will have to think about. This also relates to the
> reproducibility working group - so I would like you permission post this
> example there as well if you allow.
>
> Please note that you did a lot by making the code available, yet we need
> to figure out ways to help integration.
>
> Hopefully you will allow me to post this exchange of words publicly for
> the working group to learn from.
>
>             Jacob
>
> On Thu, Dec 17, 2020 at 12:11 AM Jacob Barhak <jacob.barhak at gmail.com>
> wrote:
>
> Thanks Will,
>
> This is a great response.
>
> By the way, you forgot to put a license on Github. However, the plot there
> is good enough for me to use - I will figure out how.
>
> No need for you to do anything more about this for now - I think I have
> enough information to plug this into my model.
>
> And yes, this can surely be an activity of the working group since it has
> to do with model integration.
>
> Also the discussion we had about legalities is a discussion that we need
> to have within the working group - those are important issues so we can
> integrate models in the future - there are legal and reproducibility issues
> on top of technical issues - we will for sure visit those in the working
> group.
>
> And believe me, I understand having little time - you already contributed
> more than many others and I will make sure to mention this contribution and
> you and Robin will know when it gets published.
>
> Many thanks.
>
>                  Jacob
>
>
>
> On Wed, Dec 16, 2020 at 6:19 AM William Hart <william.hart at keble.ox.ac.uk>
> wrote:
>
> Dear Jacob,
>
> Please see replies inline below.
>
> Best,
> Will
>
> ------------------------------
> *From:* Jacob Barhak <jacob.barhak at gmail.com>
> *Sent:* 14 December 2020 06:15
> *To:* William Hart <william.hart at keble.ox.ac.uk>
> *Cc:* robin_thompson1988_gmail_com <robin.thompson1988 at gmail.com>
> *Subject:* Re: Interface article
>
> Ok William,
>
> Your paper has the elements I need to plug into my ensamble. In fact
> perhaps several of them. However, I cannot easily extract those since the
> paper is written from a theoretical point of view and I need actual
> numbers.
>
> What I need is the function you show in figure 2A. Yet I need it as a
> function. f(t) where f is relative infectiousness and t is time from
> infection.
>
> This relative infectiousness will later be multiplied by a transmission
> probability that I calculate another way.
>
> My questions to you are:
> 1. Is the generation function you show in 2A a density function such that
> Int(f(t))=1? Since the title is density I assume the answer is yes.
>
> Yes
>
> 2. Are the functions shown in figure 2A the curves that best fit your data
> and do those optimize all parameters or did you make some assumptions on
> some of the parameters - basically I am asking if those curves are one
> example of a family of curves, or are those the optimal curve. I assume it
> is the optimal one.
>
> *This is slightly different for each different model, but in general 2 or
> 3 parameters are fitted in each case, whereas some others may be assumed
> (this is fully detailed in the methods section).*
>
> 3. Do you have the function form of the curves im 2A or are those
> numerically solved - I assume those are numerically solved -  I saw you use
> MCMC. Yet I may be mistaken.
>
> *Again, this differs between models, but in most cases there is no closed
> form for the generation time distribution, so some numerics are necessary. *
>
> 4. Is it possible to get the numbers that generated plot 2A in spreadsheet
> / table form rather than extracting them by manual digitization from the
> figure?
>
> *MATLAB code from the preprint is publicly available; in particular,
> please find the code to generate Fig. 2 here
> (https://github.com/will-s-hart/COVID-19-Infectiousness-Profile/tree/main/Plotting%20code
> <https://github.com/will-s-hart/COVID-19-Infectiousness-Profile/tree/main/Plotting%20code>).
> If you have no access to MATLAB (the code may work on Octave etc, but I'm
> not sure), I may be able to send you a spreadsheet of values.*
>
> 5. Can this information be used for commercial purposes  and what terms of
> use are affiliated with this information? I need to ask since I am a sole
> proprietor -  a company of one person. If I invest the time to hand
> digitize the functions in 2A and plug them into my ensemble, I need to know
> I will not have to remove them in the future because of some legal
> limitation or a scientist that misbehaves - this happened in the past and
> cost me a lot of work and it is not worth for me using if the model is
> somehow restricted. Basically I am asking you what the usage term for your
> are?
>
> *Please see Robin's reply (I have no personal issue with you using our
> results).*
>
>
> Also, are you aware of this publication:
> https://doi.org/10.1101/2020.09.25.20201772
>
> It has an infectiousness model and Alan Perelson talked about it here:
> https://www.youtube.com/watch?v=RXY8EoChWU4
>
> *Looks very interesting!*
>
>
> The video is part of the Viral pandemic working group. I saw that you are
> part of the integration working group - so if we reach an agreement I will
> be happy to add this as an official activity of the subgroup.
>
> *Up to you - unfortunately, I don't think I would have time to contribute
> to this if it became a subgroup activity, but I would be interested to see
> the results.*
>
>
>                   Jacob
>
> --
> Jacob Barhak Ph.D.
> https://sites.google.com/view/jacob-barhak/home
>
>
>
> On Thu, Dec 10, 2020 at 12:57 AM Jacob Barhak <jacob.barhak at gmail.com>
> wrote:
>
> Great William,
>
> You may be right, at first glance this seems like what I am looking for -
> yet I need to read the details and can do this only during the weekend - so
> I will come back with questions after I read this and see if I can plug
> your model into the ensemble.
>
> I look forward to more communications.
>
>              Jacob
>
> On Wed, Dec 9, 2020 at 9:44 AM William Hart <william.hart at keble.ox.ac.uk>
> wrote:
>
> Hi Jacob,
>
> Looks interesting!
>
> Yes, we have actually recently released a pre-print in which we do (I
> think) exactly what you're looking for (
> https://www.medrxiv.org/content/10.1101/2020.11.20.20235754v1). In
> particular, Figure 2A of the pre-print shows the (expected) infectiousness
> as a function of time since infection (we considered fitting several
> different models of infectiousness to data; the blue curve corresponds to
> our best model).
>
> Please do let me know if you have any questions about what we did.
>
> Best,
> Will
> High infectiousness immediately before COVID-19 symptom onset highlights
> the importance of contact tracing | medRxiv
> <https://www.medrxiv.org/content/10.1101/2020.11.20.20235754v1>
> Understanding changes in infectiousness during COVID-19 infections is
> critical to assess the effectiveness of public health measures such as
> contact tracing. Data from known source-recipient pairs can be used to
> estimate the average infectiousness profile of infected individuals, and to
> evaluate the proportion of presymptomatic transmissions.
> www.medrxiv.org
>
> ------------------------------
> *From:* Jacob Barhak <jacob.barhak at gmail.com>
> *Sent:* 09 December 2020 14:30
> *To:* robin_thompson1988_gmail_com <robin.thompson1988 at gmail.com>
> *Cc:* William Hart <william.hart at keble.ox.ac.uk>
> *Subject:* Re: Interface article
>
> Hi William,
>
> Robin communicated to me that you may have some answers regarding COVID-19
> infectiousness. I am modeling COVID-19 as you can see from this article:
>
>    -
>
>    Barhak J , The Reference Model Initial Use Case for COVID-19. Cureus. *http://dx.doi.org/10.7759/cureus.9455
>    <http://dx.doi.org/10.7759/cureus.9455>* , Online: *https://www.cureus.com/articles/36677-the-reference-model-an-initial-use-case-for-covid-19
>    <https://www.cureus.com/articles/36677-the-reference-model-an-initial-use-case-for-covid-19>* .
>    PMCID: *PMC7392354
>    <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7392354/>* , PMID:
>    32760637 , Interactive Results: *https://jacob-barhak.netlify.app/thereferencemodel/results_covid19_2020_06_27/combinedplot
>    <https://jacob-barhak.netlify.app/thereferencemodel/results_covid19_2020_06_27/combinedplot>*
>
>
> I am expanding this work to include multiple infectioneness models and
> have preliminary results already, yet I am looking for more infectiousness
> models to plug into the model.
>
> Specifically I am looking for a function that will communicate how
> infectious is an individual as a function of time since their infection.
>
> Hopefully you have something like this and will be happy to communicate.
>
>                   Jacob
>
>
>
>
>
>
> On Wed, Dec 9, 2020 at 5:34 AM Robin Thompson <
> robin.thompson1988 at gmail.com> wrote:
>
> Hi Will,
>
> Jacob Barhak (cc’d) contacted me because he is interested in your
> within-host -> between-host manuscript. I thought you might be in a better
> position to answer any questions he might have, so this email is to connect
> him to you.
>
> Thanks, and best wishes,
> Robin
> --------------------------------------------------------------------------
> Dr Robin Thompson
> Junior Research Fellow in Mathematical Epidemiology
> Christ Church
> University of Oxford, UK
> www.robin-thompson.co.uk
> --------------------------------------------------------------------------
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.simtk.org/pipermail/vp-integration-subgroup/attachments/20210111/a51cc16a/attachment-0001.html>


More information about the Vp-integration-subgroup mailing list