[Vp-integration-subgroup] Another case study of model integration

Robin Thompson robin.thompson1988 at gmail.com
Thu Apr 29 00:59:09 PDT 2021


Hi Jacob,

Thanks for this. All the generation time distributions (i.e. expected infectiousness as a function of the time since infection) that we have been considering are normalised so that they integrate to 1.  In fact, our manuscript in which we estimate the SARS-CoV-2 generation time distribution using a few different models was accepted by eLife earlier this week. The accepted manuscript (pre journal formatting) is available at:

https://elifesciences.org/articles/65534

Figure 2A (at the end of the manuscript) shows the inferred generation time distribution for four different models. The purple dotted one involves the standard assumption that the generation time and incubation period are independent.

If anyone in the WG has any comments on the manuscript, then please do get in touch :-)

Thanks, and best wishes,
Robin
----------------------------------------------------------------------
Dr Robin Thompson
Assistant Professor of Mathematical Epidemiology
Mathematics Institute
University of Warwick, UK
www.robin-thompson.co.uk
----------------------------------------------------------------------

> On 29 Apr 2021, at 07:48, Jacob Barhak <jacob.barhak at gmail.com> wrote:
> 
> Yes Lucas, 
> 
> You are right. 
> 
> All the infectiousness models are presented here as portions from maximal infectiousness not as density functions. 
> 
> Each of the models had a different definition and to compare them,  a common definition is needed. 
> 
> The definition I asked also matches the definition used in my ensemble. I still have not integrated it in the ensemble,  yet I am about to do it today,  yet simulations will take more time. And thank you for allowing to release this under CC0.
> 
> Hopefully this documentation is clear enough and the comparison is useful. The fact that your model resembles another and I assume you started from different assumptions,  may have importance. I highly recommend you connect with the other modelers that modeled infectiousness used here and discuss the differences. it may lead to better understanding. 
> 
> I know Robin is on this mailing list. Hopefully he will choose to comment. 
> 
> 
>         Jacob
> 
> 
> 
> 
> 
> On Thu, Apr 29, 2021, 01:32 LUCAS BOETTCHER <lucasb at g.ucla.edu <mailto:lucasb at g.ucla.edu>> wrote:
> Hi Jacob
> 
> Thanks for integrating our infectiousness model into your modeling framework! 
> 
> One point I am not so sure about is the normalization of these different distributions. 
> 
> If t is the time since infection, the distribution in our paper is normalized such that the integration over t from 0 to infinity yields 1. I think that other distributions are normalized in a different way, or not normalized at all? For example, figure 3G (Ke et al) does not seem to be normalized in the same way as our PDF?
> 
> Best
> 
> Lucas
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Apr 28, 2021 at 1:33 PM Jacob Barhak <jacob.barhak at gmail.com <mailto:jacob.barhak at gmail.com>> wrote:
> Hi Lucas,
> 
> You may wish to compare your infectiousness model to the one generated by Will Hart and Robin Thompson. The are the closest ones from the ones I implemented and made available here:
> https://github.com/Jacob-Barhak/COVID19Models/tree/main/COVID19_Infectiousness_Multi <https://github.com/Jacob-Barhak/COVID19Models/tree/main/COVID19_Infectiousness_Multi>
> 
> If you download the html file alone to your machine, you should be able to view all models. 
> 
> Please note that I had issues with the umlaut 'o' character in your name since I wanted to avoid Unicode issues, so I spelled your last name as Bottcher - please let me know if you want it changed - I am sure you see this problem a lot and may have a preference. 
> 
> Hopefully you like the comparison to other potential models.
> 
> I will proceed with integrating this model into my ensemble. 
> 
>            Jacob
> 
> 
> 
> On Tue, Mar 30, 2021 at 1:23 PM Jacob Barhak <jacob.barhak at gmail.com <mailto:jacob.barhak at gmail.com>> wrote:
> Greetings Integration sub-group,
> 
> Below you will find another attempt to integrate a few models created by Lucas Boettcher into a COVID-19 model.
> 
> Those interested in following the details you will find our correspondence in that thread to show difficulties in integrating models. 
> 
> I will attempt to summarize for those with little time to follow back details. 
> 
> Lucas had several models that we attempted to reuse:
> - Recovery model and incubation model based on Singapore data
> - Several mortality models - one based on CDC data
> - An infectiousess model based on a previous version of https://doi.org/10.1038/s41591-020-0869-5 <https://doi.org/10.1038/s41591-020-0869-5>
> 
> So far, after roughly 2 weeks of correspondence we were able to:
> 1. transmit the infectiousness profile and make sure I can implement it properly - trace it back to data to make sure it is reusable. Note that in this case we were using the same language - python and still transmission of the formula was not straightforward since there was ambiguity in forms of the function that can be defined in different ways. 
> 
> 2. Determine that Recovery / incubation models cannot be reused currently since the data source that made the data available is not responding and did not specify usage terms. I asked assistance from this mailing list to contact the entity responsible for the data in this message: https://lists.simtk.org/pipermail/vp-integration-subgroup/2021-March/000043.html <https://lists.simtk.org/pipermail/vp-integration-subgroup/2021-March/000043.html>  
> If you can help, please respond.
> 
> 3. The Mortality model was not fully defined and I will wait for publication of the preprint - hopefully Lucase will transmit it to this mailing list. However, I highly suggest people look at his paper that discusses mortality - it shows some important aspects of counting numbers and how confusing something as reported  mortality numbers can be.  You can find the paper here:
> https://doi.org/10.1088/1478-3975/ab9e59 <https://doi.org/10.1088/1478-3975/ab9e59>
> 
> For those interested in the fine details - please keep on reading the correspondence below in reverse chronological order.
> 
> Feedback from subgroup members will be appreciated.
> 
>             Jacob
> 
> 
> 
> 
> On Tue, Mar 30, 2021 at 8:21 AM LUCAS BOETTCHER <lucasb at g.ucla.edu <mailto:lucasb at g.ucla.edu>> wrote:
> Hi Jacob
> 
> Yes, please feel free to add our discussion to the mailing list.
> 
> Best
> 
> Lucas
> 
> On Tue, Mar 30, 2021 at 10:37 AM Jacob Barhak <jacob.barhak at gmail.com <mailto:jacob.barhak at gmail.com>> wrote:
> Thanks Lucas,
> 
> These are all good news. Since the recovery function is associated with the Singapore data, then we can hold with it until we authenticate the data.
> 
> The infectiousness curve you mentioned is based on an article stating that there is no restriction on data access in the data availability section.- yet it would be nice to write a note to the authors about using their data - it is good scholarship - and I noticed that those authors actually correspond - look at the correction to their paper. So it would be nice to write them an email indicating their data was useful. I think their correction does not involve data change, so if you used their data, you should be fine - yet it is worth another check 
> 
> I will wait for your mortality presprint when it is available.
> 
> I think the discussion in this thread is good enough to go public in the maling list as it seems to me now - so if you approve, I will add the integration mailing list to the recipient list and summarize the difficulties in integration we encountered. It is important people can see with their own eyes the difficulties as they appear in practice. Hopefully those cases will help support methods that will improve things in the long run.
> 
> I hope you still approve of this going public. 
> 
>           Jacob
> 
> 
> 
> On Tue, Mar 30, 2021 at 2:16 AM LUCAS BOETTCHER <lucasb at g.ucla.edu <mailto:lucasb at g.ucla.edu>> wrote:
> Hi Jacob
> 
> Yes, I'll try to clarify some points below.
> 
> On Mon, Mar 29, 2021 at 9:31 PM Jacob Barhak <jacob.barhak at gmail.com <mailto:jacob.barhak at gmail.com>> wrote:
> Thanks Lucas,
> 
> You will have to bear with me. The amount of information you transmitted is actually non trivial and as much as you tried to communicate it clearly it is just too much condensed in one email. I already got confused as it seems.
> 
> Allow me to clarify with a few questions:
> 
> 1) the Singapore data and the python program you sent were for recovery /  incubation and it is based on the singapore data - correct?
> >> Yes, that's correct.
>  
> 2) The infectiousness curve we reconstructed is Eq (7) in your  mortality paper - What data did you fit it to? Is it also fittet on the Singapore data?
> >> We inferred this curve from the first (uncorrected) version of "He, X., Lau, E. H., Wu, P., Deng, X., Wang, J., Hao, X., ... & Leung, G. M. (2020). Temporal dynamics in viral shedding and transmissibility of COVID-19. Nature medicine, 26(5), 672-675."
> 
> 3) What is the equation for mortality I can use to plug in with other mortality functions? I see Table 2 summarizing different formats to calculate mortality, yet I need a more formal equation I can use that is a function of parameters such as MortalityProbablityPDF( TimeSinceInfectionInDays, AgeInYears).
> 
> >> Our first mortality paper appeared when there was little knowledge about age and mortality characteristics. We proposed some functional forms, but I think that there are better estimates available now. We're about to finalize another manuscript with a more advanced temporal network model with age structure/age-dependent mortality and different communities. I will share the preprint with you as soon as possible. 
>  
> If you used CDC data such as https://www.cdc.gov/mmwr/volumes/69/wr/mm6912e2.htm?s_cid=mm6912e2_w <https://www.cdc.gov/mmwr/volumes/69/wr/mm6912e2.htm?s_cid=mm6912e2_w> then there are no restrictions on yuse sicne US governemtn data is considered public domain in most cases - there are very rare case where government provies a license since data was acquired from a 3rd party, yet generally, in the US government publications have no copyright - in fact I think it is similar in some other countries - yet I am not a lawyer - so it is worth checking.
> >> Ok, good to know.
>  
> 
> I must admit that I already got confused from the amount of information with the infectiousness data and in my mind associated it with the Singapore data - hopefully it is not associated and can be reused.
> 
>                Jacob
> 
> On Mon, Mar 29, 2021 at 1:27 AM LUCAS BOETTCHER <lucasb at g.ucla.edu <mailto:lucasb at g.ucla.edu>> wrote:
> Hi Jacob
> 
> Yes, let's proceed. The mortality datasets are taken from various statistical offices and the CDC. 
> 
> If you're mainly interested in US mortality statistics, we just have to contact the CDC and ask about these licencing issues.
> 
> Best
> 
> Lucas
> 
> 
> On Sunday, March 28, 2021, Jacob Barhak <jacob.barhak at gmail.com <mailto:jacob.barhak at gmail.com>> wrote:
> > Hi Lucas,
> > It seems that data from the Singapure web site cannot be verified - I sent an email to the mailing list in hope someone has a contact in Singapore that can help with verifying the data and its usage terms. 
> > I suggest we wait a bit more and if we still cannot move forward with that data, we can focus on other elements I can reuse from your paper towards integration. I already have several infectiousness curves, so we can perhaps focus on mortality if this in not connected to the Singapore data.
> > I hope this makes sense to you and moves us forward.
> >              Jacob
> >
> >
> > On Wed, Mar 17, 2021 at 11:49 AM LUCAS BOETTCHER <lucasb at g.ucla.edu <mailto:lucasb at g.ucla.edu>> wrote:
> >>
> >> Thanks for your comments! I checked everything; responses are below.
> >>
> >> On Wed, Mar 17, 2021 at 12:51 PM Jacob Barhak <jacob.barhak at gmail.com <mailto:jacob.barhak at gmail.com>> wrote:
> >>>
> >>> Thanks Lucas,
> >>> This is a good discussion since it shows more aspects of integration difficulties. 
> >>> First thanks for being specific about the use of the gamma function to calculate infectiousness. Yet even with your clarifications, it looks a bit confusing to me and I want to verify that I am not misusing it. Therefore let me confirm with you that reimplementation is correct by giving two values of x:
> >>> >>> import scipy
> >>> >>> from scipy.stats import gamma
> >>> >>> a=8
> >>> >>> b=1.25
> >>> >>> x=3
> >>> >>> b*gamma.pdf(b*x, a)
> >>> 0.060826670304049466
> >>> >>> x=4
> >>> >>> b*gamma.pdf(b*x, a)
> >>> 0.13055607869631744
> >>>
> >>> And please confirm that x in that example is time in days from infection.
> >>
> >>  
> >> >>>>>> Yes, I can confirm both. The numbers are correct and x is the time [days] from infection.
> >>  
> >>>
> >>> If this is correct, then for my own purposes, I will need to get the probability of infection for each day from 0 to 18 . so this should generate the following results:
> >>> >>> import numpy as np
> >>> >>> x= np.array(range(19))
> >>> >>> b*gamma.pdf(b*x, a)
> >>> array([0.00000000e+00, 3.38829695e-04, 1.24257706e-02, 6.08266703e-02,
> >>>        1.30556079e-01, 1.78360666e-01, 1.83104790e-01, 1.54333118e-01,
> >>>        1.12599032e-01, 7.35756677e-02, 4.40725870e-02, 2.46064656e-02,
> >>>        1.29628669e-02, 6.50380786e-03, 3.13034629e-03, 1.45367341e-03,
> >>>        6.54334483e-04, 2.86572345e-04, 1.22498638e-04])
> >>>
> >>> If this is a good enough approximation, then the question is what does the numbers I generate mean? I assume this is the infectiousness density that sums to 1 since:
> >>> >>> sum(b*gamma.pdf(b*x, a))
> >>> 0.9999137765146388
> >>>
> >>
> >> >>>>>> Right, this distribution is normalized to 1. If one wants to obtain an infection rate for a disease model one has to use the methods described in the mortality paper I forwarded you. Equation 17 connects the infectiousness distribution with S0*R0, so one can fix the pre-factor in Eq. 16 using a given S0*R0 (which can be estimated) and Eq. 17.
> >>
> >> https://doi.org/10.1088/1478-3975/ab9e59 <https://doi.org/10.1088/1478-3975/ab9e59>
> >>
> >>> As for the data. This is a typical example of ambiguity with regards to reuse. The team that produced the data did not specify a license yet made the data available. Typically for academic purposes such data is considered fair use. However, since I am a sole proprietor - a for profit organization, then I have to be selective and inquire if I can reuse this data. Options are that:
> >>> 1. The authors wanted to make this data public domain and therefore there is no copyright statement on the web site
> >>> 2. The authors neglected to put a copyright / license since they are overworked and this was not the most important thing on their mind - they want the data to be useful, yet have not considered implications of reuse.
> >>> 3. The authors considered the issues and decided to release this like this - this situation is problematic since it makes reuse terms unclear
> >>> I suspect that the answer is one of the first two options, yet I think that this can be clarified by contacting the web site authors listed as UPCODE ACADEMY - their web site is: https://www.upcodeacademy.com/ <https://www.upcodeacademy.com/> 
> >>> I located their email to be:
> >>> hello at upcodeacademy.com <mailto:hello at upcodeacademy.com>
> >>>
> >>> I think we should ask them to be explicit about the data and ask to release it under CC0 to clear all doubts. Since you plan to upload the data to github, you rather know the license beforehand to make sure you properly define the license on Github.However, I will be happy to communicate with them for you. 
> >>
> >> >> Ok, it would be great if you could clarify the Singapore data license. For my projects, I would just upload the data and specify the source. In your case it will be better to clarify the license type.
> >> I will send you a GitHub repo link later.
> >>  
> >>>
> >>> Once you are ready with your github and remove the zip file, we can add the integration subgroup mailing list to the recipient list and make this conversation public. It shows again the difficulties with integration and how much effort and communication there should be. This is excellent for the subgroup.
> >>
> >> >> Ok, perfect. Thanks!
> >>  
> >>>
> >>>                 Jacob
> >>>
> >>>
> >>>  
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Mar 17, 2021 at 3:21 AM LUCAS BOETTCHER <lucasb at g.ucla.edu <mailto:lucasb at g.ucla.edu>> wrote:
> >>>>
> >>>> Hi Jacob
> >>>> Thanks for your comments!
> >>>>
> >>>> I directly respond to your comments below.
> >>>>
> >>>> On Tue, Mar 16, 2021 at 11:45 PM Jacob Barhak <jacob.barhak at gmail.com <mailto:jacob.barhak at gmail.com>> wrote:
> >>>>>
> >>>>> Many thanks Lucas,
> >>>>> This makes much more sense now. 
> >>>>> However, just to show the subgroup that integration and  reproducibility is still difficult, I want to show some ambiguity.
> >>>>
> >>>> >> Yes, I agree. Different definitions of certain distributions are confusing.
> >>>>  
> >>>>>
> >>>>> The infectiousness curve you describe is a gamma distribution. There are two forms that it can be described by: 1) shape and rate, 2) shape and scale 
> >>>>> https://en.wikipedia.org/wiki/Gamma_distribution <https://en.wikipedia.org/wiki/Gamma_distribution>
> >>>>> From your text I assume that n=8 is shape and lambda =1.25/day is a rate 
> >>>>> So let me rewrite the function explicitly. Is the function I should use for infectiousness in day x:
> >>>>> f(t;a,b) = b^a*x^(a-1)*e^(-b*x) / (a-1)!
> >>>>> where a-8 and b=1.25 ?
> >>>>
> >>>> >> This is the correct representation (it's equation 7 in the mortality paper I shared). 
> >>>>
> >>>>> If I need to implement it, do you think I can just use this python implementation?
> >>>>> https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gamma.html <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gamma.html>
> >>>>
> >>>> >> Yes, this one works, but one has to add the second parameter.
> >>>> Using your notation from above, I would use:
> >>>>
> >>>> from scipy.stats import gamma
> >>>> b*gamma.pdf(b*x, a)
> >>>>>
> >>>>> As for the updated zip file - I got it to work and I can see the plots - the incubation period plot is less interesting for me, yet the recovery histogram is helpful - I actually played with the number of bins to see the data. 
> >>>>> However, I do have a few questions. 
> >>>>> 1) You used Singapore data - does this data have some restrictions on use - meaning is there a license associated with it that will restrict reuse of this data for commercial purposes or redistribution of the data. You will have to check the terms of data usage with the origin - if there is a copyright symbol and no license indicating otherwise, it becomes a problem  we need to discuss before going public. I checked the web site you quoted and did not see a copyright notice, nor did I see a way to download the data as CSV. so I assume you can communicate with the data source to clarify those details.
> >>>>
> >>>> >> The data is extracted from https://co.vid19.sg/singapore/cases/search <https://co.vid19.sg/singapore/cases/search> (Now they have more than 6,000 tracked cases!). It's a really underestimated source of tracked Covid cases.
> >>>> I've never seen any copyright symbols or licenses and tried to contact some health officials from Singapore last year, but without success. If you find some contact details, we can ask them.
> >>>>  
> >>>>>
> >>>>> 2) Assuming that there is no restriction on data, you should still specify license on the code you created - I suggested we are doing this towards releasing this under CC0, yet once we add the mailing list to this conversation, many people can access your zip file and we need to be clear on what is allowed to do with each version.
> >>>>
> >>>> >> I would suggest that we first create a cleaned-up version of my plotting script and upload it to one of your or my GitHub repos. Then I'll remove the ZIP, so that others just use the clean GitHub version.
> >>>>  
> >>>>>
> >>>>> If the Singapore data is already public domain and you are willing to release your code under CC0 - I can proceed and process your code and create a model I will publish for you on Github. Yet you have to decide if you want the zip file to become public so others can view it. 
> >>>>
> >>>> >> Yes, CC0 is fine.
> >>>>  
> >>>>>
> >>>>> I did not add the mailing list email since I want you to be ok with details before we go public. Once we clear those issues, we can make the conversation public. As you can see I am cautious before I make things public - one reason for cautiousness is to show the subgroup what is proper practice and how models and data should be checked for licenses.
> >>>>
> >>>> >> That's great! I think it's good to pay attention to those details.
> >>>>  
> >>>>>
> >>>>> In any case, many thanks for this - this is progress.
> >>>>>            Jacob
> >>>>>
> >>>>> On Tue, Mar 16, 2021 at 2:11 AM LUCAS BOETTCHER <lucasb at g.ucla.edu <mailto:lucasb at g.ucla.edu>> wrote:
> >>>>>>
> >>>>>> Hi Jacob
> >>>>>> Yes, I meant equation 16 not 18 in [1]. This equation describes the infectiousness \beta(\tau) as a function of the time since infection \tau. The distribution parameters are as specified in my previous email and also described in [1].
> >>>>>> I updated the ZIP: http://lucas-boettcher.info/downloads/singapore_.zip <http://lucas-boettcher.info/downloads/singapore_.zip>
> >>>>>> There is no need anymore to have Latex connected to python to run this script. I'll add a YML environment file next time.
> >>>>>> I am fine with releasing everything I shared under CC0; please feel free to add our discussion to the mailing list.
> >>>>>> Best
> >>>>>> Lucas
> >>>>>>
> >>>>>> ---
> >>>>>> [1] Böttcher, L., Xia, M., & Chou, T. (2020). Why case fatality ratios can be misleading: individual-and population-based mortality estimates and factors influencing them. Physical Biology, 17(6), 065003.
> >>>>>> On Sun, Mar 14, 2021 at 6:35 PM LUCAS BOETTCHER <lucasb at g.ucla.edu <mailto:lucasb at g.ucla.edu>> wrote:
> >>>>>>>
> >>>>>>> Hi Jacob
> >>>>>>> In [1] (Eq. 18) we used the gamma distribution
> >>>>>>> \beta(\tau)=\beta_0 \rho(\tau;n,\lambda),
> >>>>>>> to describe an infectiousness profile estimate from [2]. Here, \tau is the time since infection, n=8 (shape parameter), and \lambda=1.25/day (rate parameter). The amplitude \beta_0 S_0 can be estimated using R_0 estimates (see [1]).
> >>>>>>> Incubation period and recovery time profiles (incl. data from https://co.vid19.sg/cases <https://co.vid19.sg/cases>) are stored here: http://lucas-boettcher.info/downloads/singapore_.zip <http://lucas-boettcher.info/downloads/singapore_.zip>
> >>>>>>> (I'll remove the ZIP in a few weeks, but you can download and store the data somewhere else if it's helpful for your research.)
> >>>>>>>
> >>>>>>> And regarding the license issue, please let me know what would be best for your work. I am not sure if CC0 might be the best solution for you:
> >>>>>>> https://opensource.stackexchange.com/questions/133/how-could-using-code-released-under-cc0-infringe-on-the-authors-patents <https://opensource.stackexchange.com/questions/133/how-could-using-code-released-under-cc0-infringe-on-the-authors-patents>
> >>>>>>> Best
> >>>>>>> Lucas
> >>>>>>> ---
> >>>>>>> [1] Böttcher, L., Xia, M., & Chou, T. (2020). Why case fatality ratios can be misleading: individual-and population-based mortality estimates and factors influencing them. Physical Biology, 17(6), 065003.
> >>>>>>> [2] He, X., Lau, E. H., Wu, P., Deng, X., Wang, J., Hao, X., ... & Leung, G. M. (2020). Temporal dynamics in viral shedding and transmissibility of COVID-19. Nature medicine, 26(5), 672-675.
> _______________________________________________
> Vp-integration-subgroup mailing list
> Vp-integration-subgroup at lists.simtk.org
> https://lists.simtk.org/mailman/listinfo/vp-integration-subgroup

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.simtk.org/pipermail/vp-integration-subgroup/attachments/20210429/89302e51/attachment-0001.html>


More information about the Vp-integration-subgroup mailing list