[Vp-integration-subgroup] "Models are not consistently licensed"

Jacob Barhak jacob.barhak at gmail.com
Mon Dec 20 02:21:31 PST 2021


Hi William,

It is time to revive the discussion that we had a long time ago since the
reviewer is asking questions with regards to IP in the returned paper on
two occasions. And I want to explore better the answer to the reviewer and
perhaps modify text.

William, I name you since you were interested explicitly, yet others may
want to join. I think Sheriff may want to add some elements here since his
institution pioneers the use of public domain type licenses for modeling
and he can perhaps share more information about the decision made to help
us advance the topic and convey it better.

Here are the specific remarks from the reviewer that we need address:

"Models are Hard to Locate: Are the authors suggesting that entire
simulation workflows, from model construction to analysis, should be
publicly available? At what point does one consider intellectual property?
Do the authors advocate for such extensive publishing for all models, or
only ones that are intended to be widely re-used?"

and

"Models are Not Consistently Licensed…: Are the authors implying here that
all modeling work should be published with no rights reserved? Is it
reasonable to expect modelers to make their work freely usable by others
for profit? Is it reasonable for institutions to allow this? How much does
this really contribute to reproducibility and utility?"

Let us break it up to smaller questions:

Q: Are the authors implying here that all modeling work should be published
with no rights reserved?

A: Releasing code or text to the public domain does not mean that authors
cannot release the same code/text under a different license. It is possible
to have the code exist in multiple licenses and/or restrictions. The
intention is that code being released should be easily reusable.if you do
want it reused. This is why many government documents exist in the public
domain - so that barriers to reuse are removed. Modelers should be free to
choose a publication mechanism that fits them - if they aim for reuse then
a public domain type license is highly recommended. Consider that model
publication impact is highly diminished if the product has no utility...


Q: Is it reasonable to expect modelers to make their work freely usable by
others for profit?

A: It is more than reasonable that creators will get compensated for their
efforts. The method of compensation depends on the model. One reasonable
claim is that government funded research should be made freely available to
the public. In the past there were several attempts to implement this -
here is a bill:
https://www.cornyn.senate.gov/content/news/cornyn-wyden-introduce-bill-increase-access-taxpayer-funded-research
that was supposed to increase access to research - Also there was an
attempt by a previous US administration policy:
http://blogs.nature.com/news/2013/02/us-white-house-announces-open-access-policy.html
However to date the NIH policy here allows grantors to protect their
intellectual property https://grants.nih.gov/policy/intell-property.htm -
so it seems this discussion may continue in the US and the flow of funds
will be regulated according to change in policy and laws.
The approach recommended in the paper is aimed at increasing reuse while
not prohibiting possible profit. Recall that any software/model requires
maintenance and even off the shelf products require some level of support
which can be a source of revenue regardless of Intellectual property - the
originators will always have some small advantage with regards to profit -
even if IP  of a system is not protected. Yet modeling at this point
needs to grow to make it a profitable venture - so the recommendation to
release using public domain type licenses stands.


Q: Is it reasonable for institutions to allow this?

A: Institutions have different approaches to IP - as you can see in the
small discussion we had in the group:
https://lists.simtk.org/pipermail/vp-integration-subgroup/2021-January/000022.html
Therfore each model has different terms of reuse - so to build complex
models that assemble other models requres dealing with multiple different
types of legal treatments - some may conflict. So eventually only larger
entities will be able to legally construct models with a lot of
bureaucratic effort.  However, if each institution releases a model to the
public domain for reuse by others, the bureaucracy is eliminated and
smaller, more mobile entities will have incentives to advance the state of
the art.
The institutions can select approaches like release older versions to the
public domain or have a supported licensed version that will be profitable
alongside the public domain version - so institutions can still profit.
However, relying on copyright based open source licenses  as a mechanism to
protect Intellectual property limits growth necessary at this point in time
to reach enough success - unlike many institutions advertise, we are still
far from the promise of computational models being predictive. If
institutions add limitations, we will slow progress and delay return on
investment - in some aspects this may be similar to the known "tragedy of
the commons". We need to learn to work together to avoid situations like
the "AI winter".



Q: How much does this really contribute to reproducibility and utility?

A: This is a good question that may be best answered by experience.
During the pandemic one of the authors attempted to collect models towards
reuse in a larger model. In many cases the process involved
1. locating candidates - mostly literature review or referral.
2. looking at licensing terms - figuring out whether  the model or data can
be legally reused without restrictions
3. contacting the authors asking for details and asking for permissions to
use the model unrestricted
4. In some cases asking authors to  release under CC0 to simplify reuse

There were multiple cases where models could not be reused because the
model license terms were incompatible or authors did not respond to
multiple approaches. Fortunately some authors were gracious enough to allow
releasing their model under CC0 so some models were reusable, yet the
entire process involved bureaucracy in times where focus should have been
on generating better models to explain COVID-19.

The problem of authors not responding is regular - and a study reported in
our manuscript showed that less than 30% of authors responded to requests.
And assuming that academic institutions generate a lot of publications and
involve students - it makes some sense since students graduate and leave,
and in industry  workers leave companies and rotate. So models that got
created many times are just left as an old memory rather than a live
product. Many times models will get abandoned not being able to contact
their creators. If those creators claim intellectual property though
copyright mechanisms using different terms, it creates a barrier for reuse
that will expire in a long time similar to human lifespan. So potentially
data/models that are good for reuse will have to be abandoned - this is
already happening - Instead of innovation we sometimes reach stagnation .

If a model will have a public domain version it will create an incentive to
innovate for individuals and move forward faster through collaboration.



Q: Are the authors suggesting that entire simulation workflows, from model
construction to analysis, should be publicly available?

A: As mentioned before, a public domain version should be available if we
want to be able to assemble better models. Restricted versions can still
exist, yet it is suggested to release those into public domain after time
passes - it is similar to patent mechanism where government provides an
incentive for an inventor to innovate and disclose an invention rather than
keep it secret, and makes sure this invention is documented in a library so
that after a certain time period the invention becomes public domain and
can be used by anyone without restriction and thus driving industry, trade
and commerce forward making innovations wide spread.

As mentioned before we are at a stage where the field needs to grow and
competition keeps it from growing. We need to be able to share more if we
want to build ensembles and complex assemblies of models - otherwise many
of the products of research we work hard for become obsolete upon
publication.



Q: At what point does one consider intellectual property?

A: IP should be considered by an institution from the start - checking that
data / modeling components / software components are possible for reuse and
are compatible - if done right, this means additional paperwork that will
surely slow research and progress. However, if model components are in
public domain - such bureaucracy is simplified.



Q: Do the authors advocate for such extensive publishing for all models, or
only ones that are intended to be widely re-used?"

A: My personal opinion is that published scientific material supported by
taxpayer money should have a public domain version -   even if it is not
the version with all features. Also when a taxpayer supported  project ends
without continuation it should be released under public domain unless there
is a restriction to do so. This way innovation is possible in cases where
today there is stagnation.

Also if creators want wide reuse - release to public domain is recommended
- supporting information or publication can appear with IP protection to
gain recognition or return for investment, yet to keep information
available and accessible for a long term a public domain license will
really help.

Also consider the number of models we can create in the future to
understand our biology - if we add constraints on reuse we may never reach
a point where we have computers comprehend biology or medicine - the number
of biological processes in a human , animal, or in plant  is huge - even if
we have them all modeled computationally as components we will not be able
to legally assemble those. If we add legal restrictions we will need to
write code just to handle the legal requirements that will differ. This
issue already exists with open source software that is composed of
hundreds of contributions and entire companies and volunteers exist
to handle the mess - and this problem will be larger with biological models
since the problem is much more complex.


William, Sheriff, or anyone else, please feel free to continue this
discussion so we can eventually distill it to a shorter answer to the
reviewer and perhaps modify our paper in response.


Hopefully my ideas are clear enough.

               Jacob






On Tue, May 25, 2021 at 7:39 AM Jacob Barhak <jacob.barhak at gmail.com> wrote:

> Ok William,
>
> You really want to go into details, so let us do so. - and I will try to
> be brief, because it is an endless topic and I can really go on for a long
> time. Although brief is relative.
>
> You write that:
> "The claim that we can’t make software out of pieces with different
> licenses is demonstrably false."
> This is not entirely false, you can indeed combine different pieces in
> some conditions, yet your code can become so messy and problematic to
> transport that many times you may be better not reusing some piece of code.
> And in any case, you have multiple restrictions and many times cannot
> distribute the code together.
>
> And abandoned code is really a problem, this is why public domain licenses
> started appearing in the last decade and copyleft licenses are not used as
> much as they used to - open source was pretty good idea at a time a lot of
> code was proprietary, yet the problems of copyleft and code that needed
> relicensing started appearing and there was a need for a new solution. It
> took about a generation for public domain licenses to appear - and we are
> just starting to experience it.
>
> You write:
> "The claim that we can’t use software abandoned by the original authors is
> also false."
> The issue is that you are confined to a specific license that may not be
> compatible with what you want to achieve and if you integrate a library
> into your code and build upon it for a long time, this becomes a
> technological debt you have to carry. The problem becomes much harsher when
> you try to do something that interfaces with the commercial world that
> imposes requirements on licenses - if you have abandoned code you
> integrated - you may be stuck since you cannot change the license. This is
> also true if code was not abandoned and just has many collaborators - you
> need to trace them all and ask them all to agree to a different license -
> the more contributors you have , the more problematic it is - this
> eventually can make some code practically unusable in some circumstances.
> And in situations like this COVID pandemic where you need to act fast to
> achieve results and some licensing problem appears - believe me, it is not
> an easy situation when there is time pressure. If you use a public domain
> license - all this disappears and you can innovate quickly. Also a lot of
> the bureaucracy disappears - making life so much easier.
>
> As for patents, those exist and will be used by commercial and scientific
> entities alike - Many university faculty members hold patents - and
> Universities sometimes have departments that support and encourage the
> creation of patents - so those will exist as long as law supports it. In
> fact, if you look at NIH policy, you will find out that it allows patents
> and assigns intellectual property rights arising from grants to the
> awardees. There were several attempts at making research products free and
> accessible by the public in the US, yet those did not catch on so far. I
> can write about those in a different email - yet I am trying to stay on
> point here. So like it or not, people will restrict what you can do. And
> even an open source license does not protect you from an orthogonal legal
> restriction. However, licenses like CC0 at least inform you about it and
> remove at least one restriction - which is more than many other licenses
> do.
>
> And as for attribution - nothing says you cannot attribute the work when
> using CC0 - in fact CC0 mentions the entity releasing the code to the
> public - it is just that you are not demanded to do so. You can always give
> credit - you are just not required - so scientific practice is not
> disrupted by "public domain" licenses - it is just made easier. Also, you
> can release the same work under multiple licenses - one that demands
> attribution and one that waives copyright restrictions and let the user
> choose which one they wish to propagate. So if you think about it, once you
> can release it to the public domain, any other license is just a
> restriction on the party trying to reuse your work regardless of how
> liberal you think the license is.
>
> And you mention BSD/MIT licenses - remember, those are still forms of
> protection of intellectual property - copyright based. Regardless of how
> liberal you describe them to be, you are still tied to the original
> contributors if you need anything changed in the license due to some
> incompatibility which leads us back to the original issue of license
> compatibility.
>
> Also, when a license is copyright based it depends on who the registered
> owner is and as you may have seen in our discussion on this mailing list,
> different institutions have different policies on ownership. So it becomes
> messy again - there is really no uniformity - it is all situational and
> based on interests of the owners.
>
> And you mention OSI - it is only one organization that catalogues open
> source licenses. There are also Creative Commons,  Free Software
> Foundation,  and the Open Knowledge Foundation. And they have different
> perspectives and I must add that OSI is behind in adopting the new
> generation of public domain licenses, so perhaps it is better to choose
> another entity like Creative Commons for licenses. In fact, we both know of
> one COVID modeling platform that is released now under Creative Commons
> license rather than the traditional licenses.
>
> I agree with you that releasing a model/code without a license is
> problematic - it is actually a strong copyright restriction that equals
> "all rights reserved". So this is highly non recommended unless you really
> want to restrict.
>
> The reason this discussion is taking place is because we have a section
> about it in the paper and we do mention public domain licenses.
>
> In fact Biomodels, the repository where many biological models are stored,
> made the correct choice of license and stores models under CC0 - this means
> that those models can be reused much easier.
>
> I don't know what led to this decision by BioModels - perhaps Sheriff can
> tell us the story, yet I believe their decision was smart and correct.
> There are currently over 1,000 curated models in that repository and
> hopefully this number will grow quickly so we will have a large public
> repository that allows model reuse with an easy to use license interface.
>
> Think about it long term, if you really want modeling technologies to be
> widely adopted, you need to make them very accessible and if you want to
> integrate them, you want to remove as much bureaucracy as possible. Think
> about a future where hundreds of those models will have to be automatically
> merged together in ensembles by machine in attempts to explain observed
> biological phenomena. We are still far from that point, yet if we resolve
> the problems we listed in the paper we wrote together we will be closer to
> such a future solution. And fortunately BioModels resolved our need to
> worry about license compatibility issues.
>
> I thank you for taking the time to look at my video and the discussion,
> and I hope that this response explains well the need to remove licensing
> restrictions from integrating models.
>
>             Jacob
>
>
>
>
> On Tue, May 25, 2021 at 5:18 AM William Waites <wwaites at ieee.org> wrote:
>
>> Dear Jacob,
>>
>> I did watch your video and understand what you are saying. I’m also
>> pretty well-informed about licenses and patents as they relate to software
>> and data having been engaged with that topic in different countries (i.e.
>> different legal contexts) since the mid-1990s.
>>
>> There are several problems with your analysis.
>>
>> 1. It is perfectly well possible to compose together software with
>> different licenses. We do this all the time, and very successfully. We
>> would not have Linux distributions if this were not possible, and most of
>> the large programs written in Python or Java or whatever with a ton of
>> libraries that we use for scientific computing would not exist. Different
>> communities have different cultural ideas about which kinds of licenses
>> they prefer. Broadly, there are BSD/MIT style licenses that some like that
>> basically only require attribution, and there are copyleft GPL style
>> licenses that others like that additionally require derived work to also be
>> free. This is, to a very large extent, a solved problem. As I say, most of
>> modern computing would not be possible if we hadn’t already solved this.
>>
>> 2. Abandoned code is not a problem if it is properly licensed in the
>> first place. You are perfectly free to take any GPL or MIT or BSD licensed
>> software that has been abandoned and continue to use it and develop it.
>> Nothing stops you. Nothing at all. You are not free to change its license
>> without the involvement of the original authors, but why should you want to?
>>
>> The claim that we can’t make software out of pieces with different
>> licenses is demonstrably false.
>>
>> The claim that we can’t use software abandoned by the original authors is
>> also false.
>>
>> It is perfectly fine to use CC0. As I said, in the USA that is equivalent
>> to putting the software in the public domain. Not every country has the
>> concept of public domain in the same sense, so CC0 is designed to emulate
>> it in those cases. This is unusual, most people do not do this because they
>> require attribution at the very least. Attribution is the norm in
>> scientific work so it seems like public domain/CC0 is not really the best
>> match to established practice.
>>
>> I understand very well what you are doing with patents and you have been
>> nothing but up front about it. I understand very well what patents are and
>> how they work. I still think it’s a bad idea to propose using patents for
>> scientific models. It’s also a pretty fringe idea. I often like fringe
>> ideas but I don’t like this one.
>>
>> It is possible to get into trouble if you try to use code released under
>> a GPL-style copyleft license with something proprietary. This is by design,
>> it is not by accident or ignorance. If we want to discourage this (I don’t,
>> personally) then we can recommend the more liberal MIT/BSD style of license.
>>
>> It is a very big problem when people release code with no license at all.
>> That means we can’t do anything with it at all. I suggest that we drop the
>> discussion about patents and simply say that it is important that model
>> code is released under some license. The OSI maintains a decent list of
>> appropriate licenses: https://opensource.org/licenses
>>
>> Best wishes,
>> -w
>>
>> > On 24 May 2021, at 19:06, Jacob Barhak <jacob.barhak at gmail.com> wrote:
>> >
>> > Thanks William,
>> >
>> > A good debate is reasonable regarding licensing. So it is welcome.
>> >
>> > I can write a lot about it and in fact I have been having this
>> conversation on several channels.
>> >
>> > There are many forms of restrictions on what you can do. Even open
>> source licenses are despite their name are based on copyright law which is
>> a form of legal restriction. Both copyright and patents are forms of legal
>> restrictions. And if you want a comparison and a longer discussion, I
>> suggest you look at the table the presentation I made for COMBINE last year:
>> >       • Jacob. Barhak, Open Source and Sustainability, COMBINE 2020
>> October 5-9. Video:
>> https://drive.google.com/drive/folders/1actGnx6FwvoCcPrrF3qbnO0AmHt10WN6
>> starting from minute 13:10. Presentation:
>> https://jacob-barhak.github.io/COMBINE2020_OpenSource_upload_2020_10_04.odp
>> >
>> > Many people are unnecessarily worried about patents. I assume many
>> times without understanding the details. I repeat again my conflict of
>> interest, since I do hold patents. So I may be biased in your mind, yet
>> please do check out my arguments in the presentation.
>> >
>> > Note that just like software licenses are not always compatible with
>> each other, patents are not always compatible with some licenses and with
>> intentions of all parties involved - this is many times the source for
>> misunderstanding. Many restrictions are orthogonal to each other and need
>> to be cleared before use.. In many cases, some work may need multiple
>> licenses and permissions so you can use it.  It depends on many factors,
>> including jurisdictions, time, etc.
>> >
>> > Specifically for CC0 - CC0 is the most unrestricting license I am aware
>> of since it waives copyright and therefore highly compatible with many
>> others - this is why it was mentioned as a good solution and indeed it has
>> been widely adopted . Moreover, it resolves issues of abandoned software or
>> with software where multiple contributors cannot agree on. So it gives life
>> to code and provides incentives to improve progress.
>> >
>> > If I am about to integrate a new model or a new work, I may be
>> restricted by many restrictions, and those are coming from potentially
>> multiple sources, especially if I am integrating multiple models. So
>> eliminating copyright and making things compatible helps a lot. It may not
>> be sufficient since there are still orthogonal restrictions, yet it's a
>> good start. This is why it was recommended and indeed more and more
>> entities are using CC0 to release work or to accumulate it in a repository.
>> >
>> > You mentioned CC licenses family - yes, those are nice licenses, yet
>> some still hold restrictions and are not even compatible with each other.
>> Here is the compatibility chart within CC license family:
>> > https://wiki.creativecommons.org/wiki/Wiki/cc_license_compatibility
>> >
>> > And yes, in some cases for some entities some licenses will not match
>> their intentions - it depends on the situation - yet if you have to bridge
>> many intentions, it's a good idea to remove as many restrictions as
>> possible.
>> >
>> > Hopefully you find these explanations sufficient for now.
>> >
>> >               Jacob
>> >
>> >
>> >
>> > On Mon, May 24, 2021 at 8:19 AM William Waites <wwaites at ieee.org>
>> wrote:
>> > I am hesitant to get involved in this particular aspect of the paper
>> and have long since timed out on software licensing discussions. However…
>> >
>> > The point that there are inconsistent licenses (or even absent licenses
>> which is legally the most restricted since that defaults to “all rights
>> reserved” essentially) and this can cause problems when assembling
>> composite models is accurate and fair. This is a challenge that we need to
>> address. We want to maximise the impact of the public funding of much of
>> the kind of work that we do, which means that others need to be as free as
>> possible to reuse our work.
>> >
>> > It is debatable whether CC0 is appropriate. It is meant to emulate the
>> public domain in places that do not have a legal concept of public domain.
>> It does not require attribution, which is the normal standard for academic
>> work. The other CC licenses that require attribution are not designed for
>> software. Insisting on using the public domain for software and then
>> asserting the ability to control use using patents is a novel idea, but I
>> don’t think it is a very good one. It is also not possible in many
>> jurisdictions that do not allow software patents.
>> >
>> > Standards bodies also typically have patent policies which range from
>> “disclose your patents” to “if you contribute patented stuff you must agree
>> to never try to enforce it”. We can reasonably expect that if we produce
>> patent-encumbered standards, nobody will use them. From a standards
>> development point of view, this needs addressed as well.
>> >
>> > There is also a ton of well-developed literature on free and open
>> source software licensing and compatibility among licenses.
>> >
>> > Best wishes,
>> > -w
>> >
>> >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.simtk.org/pipermail/vp-integration-subgroup/attachments/20211220/ca77710b/attachment-0001.html>


More information about the Vp-integration-subgroup mailing list