[Vp-integration-subgroup] "Models are not consistently licensed"

William Waites wwaites at ieee.org
Sat Jan 1 04:25:39 PST 2022


I agree with Sheriff’s sentiment but find the voice confusing.
For example, most of the authors of *this* paper were not
involved in “our reproducibility study”. It is also unclear if
“we decided to continue with CC0” refers to BioModels or the
present paper.

I would prefer an explanation from first principles with
BioModels as a good example. Something like the text below.

(Aside #1. I removed the word “sophisticated” from Sheriff’s
text. Feel free to put it back. I’ve observed the pattern,
particularly around the pandemic, of pointing to complicated
models as better evidence of something than can be got with
simpler models. As though being complicated is a virtue. This
is more akin to the original meaning of the word sophisticated
which is a scientific anti-pattern. That’s not what Sheriff
meant, but I’d like to avoid using that word at all, especially
in a context that suggests it’s something good.)

(Aside #2. For software, I personally prefer copyleft licenses
and not usually public domain. In the academic sphere this
*should* not pose a problem because of the norm that says you
have to make your code available otherwise the paper that 
crucially relies on it isn’t worth the bits it’s encoded in.
However it’s not at all obvious that it’s enough to rely on
good practice for this since there’s plenty of bad practice
out there. There is some progress with journals and funders
enforcing better practice. This is just an alternative mechanism
to the legal jiu-jitsu of copyleft.)

Best wishes for the new year,
-w



—

There are practical and moral reasons why models should be
freely available. To ensure a high standard of scientific
work, it is important to be able to reproduce and verify
results. This requires software and data for models to be
freely available and that the barriers to running them 
be minimal. To facilitate scientific progress, it is
ideal to be able to build upon the work of others. This
requires not only that software and data are available but
that reuse and modification is allowed. Finally, there is
a moral argument that work produced using public funds 
should be publicly available. This concept is well embedded
in some segments of the community, for example in the case
of federally funded work in the United States, but we hold
it as a general principle.

Models are a combination of software and data. The legal 
treatment of these differs across jurisdictions with software
generally subject to copyright law and data subject to database
rights. This is not universal. The common denominator is the
maximally permissive concept of “public domain” which exists
in some jurisdictions where no restrictions are placed on use,
reuse,  modification and creation of derivative works, and, 
crucially, combination of multiple (possibly derivative) works
into a new whole. However, the public domain does not exist
as a concept in all jurisdictions and in others it is not
possible to simply place a work in it. To solve this problem,
the CC0 license (formulated with respect to copyright law
and database rights) emulates the public domain in jurisdictions
where this is necessary. We therefore recommend that models,
and their associated data are published under CC0 terms.

An example of this approach is the BioModels database. BioModels provides model curation and annotation service, where the models’
reproducibility is assessed and annotated with controlled
vocabularies. BioModels aims to make models discoverable. The 
BioModels database offers search engines to search and locate
models and model components (https://www.ebi.ac.uk/biomodels/parameterSearch). The BioModels
reproducibility study ( Tiwari et al. 2021,
https://doi.org/10.15252/msb.20209982), strongly recommends
that authors make their models public, but more specifically 
through public repositories such as BioModels (https://www.ebi.ac.uk/biomodels).
We also recommend that authors submit model codes, parameter sets,
and simulation conditions needed to reproduce simulation
studies. Overall this will greatly facilitate model reuse.

We further recommend that models published in this way should
not require proprietary software or resources in order to be
used. Requiring such resources greatly hampers reproducibility, verification and reuse. To achieve these goals, such 
dependencies typically require rewriting the models ab initio
with the original model code serving the much less useful 
(but still not worthless) function of documentation. While
there are benefits to reformulating existing models — a 
reimplementation might be more efficient or clearer and more
elegant — a hard requirement to do so in all cases simply
leads to wasted resources.



More information about the Vp-integration-subgroup mailing list