<div dir="ltr">Hi William,<div><div><br></div><div>It is time to revive the discussion that we had a long time ago since the reviewer is asking questions with regards to IP in the returned paper on two occasions. And I want to explore better the answer to the reviewer and perhaps modify text. </div><div><br></div><div>William, I name you since you were interested explicitly, yet others may want to join. I think Sheriff may want to add some elements here since his institution pioneers the use of public domain type licenses for modeling and he can perhaps share more information about the decision made to help us advance the topic and convey it better. </div><div><br></div><div>Here are the specific remarks from the reviewer that we need address:</div><div><br></div><div>"Models are Hard to Locate: Are the authors suggesting that entire simulation workflows, from model construction to analysis, should be publicly available? At what point does one consider intellectual property? Do the authors advocate for such extensive publishing for all models, or only ones that are intended to be widely re-used?"<br></div><div><br></div><div>and </div><div><br></div><div>"Models are Not Consistently Licensed…: Are the authors implying here that all modeling work should be published with no rights reserved? Is it reasonable to expect modelers to make their work freely usable by others for profit? Is it reasonable for institutions to allow this? How much does this really contribute to reproducibility and utility?"</div><div><br></div><div>Let us break it up to smaller questions:</div><div><br></div><div>Q: Are the authors implying here that all modeling work should be published with no rights reserved?<br></div><div><br></div><div>A: Releasing code or text to the public domain does not mean that authors cannot release the same code/text under a different license. It is possible to have the code exist in multiple licenses and/or restrictions. The intention is that code being released should be easily reusable.if you do want it reused. This is why many government documents exist in the public domain - so that barriers to reuse are removed. Modelers should be free to choose a publication mechanism that fits them - if they aim for reuse then a public domain type license is highly recommended. Consider that model publication impact is highly diminished if the product has no utility...</div><div><br></div><div><br></div><div>Q: Is it reasonable to expect modelers to make their work freely usable by others for profit?</div><div><br></div><div>A: It is more than reasonable that creators will get compensated for their efforts. The method of compensation depends on the model. One reasonable claim is that government funded research should be made freely available to the public. In the past there were several attempts to implement this - here is a bill: <a href="https://www.cornyn.senate.gov/content/news/cornyn-wyden-introduce-bill-increase-access-taxpayer-funded-research" target="_blank">https://www.cornyn.senate.gov/content/news/cornyn-wyden-introduce-bill-increase-access-taxpayer-funded-research</a> that was supposed to increase access to research - Also there was an attempt by a previous US administration policy: <a href="http://blogs.nature.com/news/2013/02/us-white-house-announces-open-access-policy.html" target="_blank">http://blogs.nature.com/news/2013/02/us-white-house-announces-open-access-policy.html</a> However to date the NIH policy here allows grantors to protect their intellectual property <a href="https://grants.nih.gov/policy/intell-property.htm" target="_blank">https://grants.nih.gov/policy/intell-property.htm</a> - so it seems this discussion may continue in the US and the flow of funds will be regulated according to change in policy and laws. </div><div>The approach recommended in the paper is aimed at increasing reuse while not prohibiting possible profit. Recall that any software/model requires maintenance and even off the shelf products require some level of support which can be a source of revenue regardless of Intellectual property - the originators will always have some small advantage with regards to profit - even if IP of a system is not protected. Yet modeling at this point needs to grow to make it a profitable venture - so the recommendation to release using public domain type licenses stands. </div><div><br></div><div><br></div><div>Q: Is it reasonable for institutions to allow this? <br></div><div><br></div><div>A: Institutions have different approaches to IP - as you can see in the small discussion we had in the group: <a href="https://lists.simtk.org/pipermail/vp-integration-subgroup/2021-January/000022.html">https://lists.simtk.org/pipermail/vp-integration-subgroup/2021-January/000022.html</a></div><div>Therfore each model has different terms of reuse - so to build complex models that assemble other models requres dealing with multiple different types of legal treatments - some may conflict. So eventually only larger entities will be able to legally construct models with a lot of bureaucratic effort. However, if each institution releases a model to the public domain for reuse by others, the bureaucracy is eliminated and smaller, more mobile entities will have incentives to advance the state of the art. </div><div>The institutions can select approaches like release older versions to the public domain or have a supported licensed version that will be profitable alongside the public domain version - so institutions can still profit. However, relying on copyright based open source licenses as a mechanism to protect Intellectual property limits growth necessary at this point in time to reach enough success - unlike many institutions advertise, we are still far from the promise of computational models being predictive. If institutions add limitations, we will slow progress and delay return on investment - in some aspects this may be similar to the known "tragedy of the commons". We need to learn to work together to avoid situations like the "AI winter". </div><div><br></div><div><br></div><div><br></div><div><div>Q: How much does this really contribute to reproducibility and utility?</div><div><br></div><div>A: This is a good question that may be best answered by experience. </div><div>During the pandemic one of the authors attempted to collect models towards reuse in a larger model. In many cases the process involved </div><div>1. locating candidates - mostly literature review or referral. </div><div>2. looking at licensing terms - figuring out whether the model or data can be legally reused without restrictions</div><div>3. contacting the authors asking for details and asking for permissions to use the model unrestricted </div><div>4. In some cases asking authors to release under CC0 to simplify reuse</div><div><br></div><div>There were multiple cases where models could not be reused because the model license terms were incompatible or authors did not respond to multiple approaches. Fortunately some authors were gracious enough to allow releasing their model under CC0 so some models were reusable, yet the entire process involved bureaucracy in times where focus should have been on generating better models to explain COVID-19. </div><div><br></div><div>The problem of authors not responding is regular - and a study reported in our manuscript showed that less than 30% of authors responded to requests. And assuming that academic institutions generate a lot of publications and involve students - it makes some sense since students graduate and leave, and in industry workers leave companies and rotate. So models that got created many times are just left as an old memory rather than a live product. Many times models will get abandoned not being able to contact their creators. If those creators claim intellectual property though copyright mechanisms using different terms, it creates a barrier for reuse that will expire in a long time similar to human lifespan. So potentially data/models that are good for reuse will have to be abandoned - this is already happening - Instead of innovation we sometimes reach stagnation . </div><div><br></div><div>If a model will have a public domain version it will create an incentive to innovate for individuals and move forward faster through collaboration. </div><div><br></div><div><br></div><div><br></div><div>Q: Are the authors suggesting that entire simulation workflows, from model construction to analysis, should be publicly available?<br></div><div><br></div><div>A: As mentioned before, a public domain version should be available if we want to be able to assemble better models. Restricted versions can still exist, yet it is suggested to release those into public domain after time passes - it is similar to patent mechanism where government provides an incentive for an inventor to innovate and disclose an invention rather than keep it secret, and makes sure this invention is documented in a library so that after a certain time period the invention becomes public domain and can be used by anyone without restriction and thus driving industry, trade and commerce forward making innovations wide spread. </div><div><br></div><div>As mentioned before we are at a stage where the field needs to grow and competition keeps it from growing. We need to be able to share more if we want to build ensembles and complex assemblies of models - otherwise many of the products of research we work hard for become obsolete upon publication. </div><div><br></div><div><br></div><div><br></div><div>Q: At what point does one consider intellectual property?<br></div><div><br></div><div>A: IP should be considered by an institution from the start - checking that data / modeling components / software components are possible for reuse and are compatible - if done right, this means additional paperwork that will surely slow research and progress. However, if model components are in public domain - such bureaucracy is simplified. </div><div><br></div><div><br></div><div><br></div><div>Q: Do the authors advocate for such extensive publishing for all models, or only ones that are intended to be widely re-used?"<br></div><div><br></div></div><div>A: My personal opinion is that published scientific material supported by taxpayer money should have a public domain version - even if it is not the version with all features. Also when a taxpayer supported project ends without continuation it should be released under public domain unless there is a restriction to do so. This way innovation is possible in cases where today there is stagnation. </div><div><br></div><div>Also if creators want wide reuse - release to public domain is recommended - supporting information or publication can appear with IP protection to gain recognition or return for investment, yet to keep information available and accessible for a long term a public domain license will really help. </div><div><br></div><div>Also consider the number of models we can create in the future to understand our biology - if we add constraints on reuse we may never reach a point where we have computers comprehend biology or medicine - the number of biological processes in a human , animal, or in plant is huge - even if we have them all modeled computationally as components we will not be able to legally assemble those. If we add legal restrictions we will need to write code just to handle the legal requirements that will differ. This issue already exists with open source software that is composed of hundreds of contributions and entire companies and volunteers exist to handle the mess - and this problem will be larger with biological models since the problem is much more complex. </div><div><br></div><div><br></div><div>William, Sheriff, or anyone else, please feel free to continue this discussion so we can eventually distill it to a shorter answer to the reviewer and perhaps modify our paper in response. </div><div><br></div><div><br></div><div>Hopefully my ideas are clear enough.</div><div><br></div><div> Jacob</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, May 25, 2021 at 7:39 AM Jacob Barhak <<a href="mailto:jacob.barhak@gmail.com" target="_blank">jacob.barhak@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Ok William,<div><br></div><div>You really want to go into details, so let us do so. - and I will try to be brief, because it is an endless topic and I can really go on for a long time. Although brief is relative. </div><div><br></div><div>You write that: </div><div>"The claim that we can’t make software out of pieces with different licenses is demonstrably false."<br></div><div>This is not entirely false, you can indeed combine different pieces in some conditions, yet your code can become so messy and problematic to transport that many times you may be better not reusing some piece of code. And in any case, you have multiple restrictions and many times cannot distribute the code together. </div><div><br></div><div>And abandoned code is really a problem, this is why public domain licenses started appearing in the last decade and copyleft licenses are not used as much as they used to - open source was pretty good idea at a time a lot of code was proprietary, yet the problems of copyleft and code that needed relicensing started appearing and there was a need for a new solution. It took about a generation for public domain licenses to appear - and we are just starting to experience it. </div><div><br></div><div>You write:</div><div>"The claim that we can’t use software abandoned by the original authors is also false."<br></div><div>The issue is that you are confined to a specific license that may not be compatible with what you want to achieve and if you integrate a library into your code and build upon it for a long time, this becomes a technological debt you have to carry. The problem becomes much harsher when you try to do something that interfaces with the commercial world that imposes requirements on licenses - if you have abandoned code you integrated - you may be stuck since you cannot change the license. This is also true if code was not abandoned and just has many collaborators - you need to trace them all and ask them all to agree to a different license - the more contributors you have , the more problematic it is - this eventually can make some code practically unusable in some circumstances. And in situations like this COVID pandemic where you need to act fast to achieve results and some licensing problem appears - believe me, it is not an easy situation when there is time pressure. If you use a public domain license - all this disappears and you can innovate quickly. Also a lot of the bureaucracy disappears - making life so much easier. </div><div><br></div><div>As for patents, those exist and will be used by commercial and scientific entities alike - Many university faculty members hold patents - and Universities sometimes have departments that support and encourage the creation of patents - so those will exist as long as law supports it. In fact, if you look at NIH policy, you will find out that it allows patents and assigns intellectual property rights arising from grants to the awardees. There were several attempts at making research products free and accessible by the public in the US, yet those did not catch on so far. I can write about those in a different email - yet I am trying to stay on point here. So like it or not, people will restrict what you can do. And even an open source license does not protect you from an orthogonal legal restriction. However, licenses like CC0 at least inform you about it and remove at least one restriction - which is more than many other licenses do. </div><div><br></div><div>And as for attribution - nothing says you cannot attribute the work when using CC0 - in fact CC0 mentions the entity releasing the code to the public - it is just that you are not demanded to do so. You can always give credit - you are just not required - so scientific practice is not disrupted by "public domain" licenses - it is just made easier. Also, you can release the same work under multiple licenses - one that demands attribution and one that waives copyright restrictions and let the user choose which one they wish to propagate. So if you think about it, once you can release it to the public domain, any other license is just a restriction on the party trying to reuse your work regardless of how liberal you think the license is. </div><div><br></div><div>And you mention BSD/MIT licenses - remember, those are still forms of protection of intellectual property - copyright based. Regardless of how liberal you describe them to be, you are still tied to the original contributors if you need anything changed in the license due to some incompatibility which leads us back to the original issue of license compatibility. </div><div><br></div><div>Also, when a license is copyright based it depends on who the registered owner is and as you may have seen in our discussion on this mailing list, different institutions have different policies on ownership. So it becomes messy again - there is really no uniformity - it is all situational and based on interests of the owners. </div><div><br></div><div>And you mention OSI - it is only one organization that catalogues open source licenses. There are also Creative Commons, Free Software Foundation, and the Open Knowledge Foundation. And they have different perspectives and I must add that OSI is behind in adopting the new generation of public domain licenses, so perhaps it is better to choose another entity like Creative Commons for licenses. In fact, we both know of one COVID modeling platform that is released now under Creative Commons license rather than the traditional licenses. </div><div><br></div><div>I agree with you that releasing a model/code without a license is problematic - it is actually a strong copyright restriction that equals "all rights reserved". So this is highly non recommended unless you really want to restrict.</div><div><br></div><div>The reason this discussion is taking place is because we have a section about it in the paper and we do mention public domain licenses. </div><div><br></div><div>In fact Biomodels, the repository where many biological models are stored, made the correct choice of license and stores models under CC0 - this means that those models can be reused much easier. </div><div><br></div><div>I don't know what led to this decision by BioModels - perhaps Sheriff can tell us the story, yet I believe their decision was smart and correct. There are currently over 1,000 curated models in that repository and hopefully this number will grow quickly so we will have a large public repository that allows model reuse with an easy to use license interface.</div><div><br></div><div>Think about it long term, if you really want modeling technologies to be widely adopted, you need to make them very accessible and if you want to integrate them, you want to remove as much bureaucracy as possible. Think about a future where hundreds of those models will have to be automatically merged together in ensembles by machine in attempts to explain observed biological phenomena. We are still far from that point, yet if we resolve the problems we listed in the paper we wrote together we will be closer to such a future solution. And fortunately BioModels resolved our need to worry about license compatibility issues. </div><div><br></div><div>I thank you for taking the time to look at my video and the discussion, and I hope that this response explains well the need to remove licensing restrictions from integrating models. </div><div><br></div><div> Jacob</div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, May 25, 2021 at 5:18 AM William Waites <<a href="mailto:wwaites@ieee.org" target="_blank">wwaites@ieee.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear Jacob,<br>
<br>
I did watch your video and understand what you are saying. I’m also pretty well-informed about licenses and patents as they relate to software and data having been engaged with that topic in different countries (i.e. different legal contexts) since the mid-1990s.<br>
<br>
There are several problems with your analysis.<br>
<br>
1. It is perfectly well possible to compose together software with different licenses. We do this all the time, and very successfully. We would not have Linux distributions if this were not possible, and most of the large programs written in Python or Java or whatever with a ton of libraries that we use for scientific computing would not exist. Different communities have different cultural ideas about which kinds of licenses they prefer. Broadly, there are BSD/MIT style licenses that some like that basically only require attribution, and there are copyleft GPL style licenses that others like that additionally require derived work to also be free. This is, to a very large extent, a solved problem. As I say, most of modern computing would not be possible if we hadn’t already solved this.<br>
<br>
2. Abandoned code is not a problem if it is properly licensed in the first place. You are perfectly free to take any GPL or MIT or BSD licensed software that has been abandoned and continue to use it and develop it. Nothing stops you. Nothing at all. You are not free to change its license without the involvement of the original authors, but why should you want to?<br>
<br>
The claim that we can’t make software out of pieces with different licenses is demonstrably false.<br>
<br>
The claim that we can’t use software abandoned by the original authors is also false.<br>
<br>
It is perfectly fine to use CC0. As I said, in the USA that is equivalent to putting the software in the public domain. Not every country has the concept of public domain in the same sense, so CC0 is designed to emulate it in those cases. This is unusual, most people do not do this because they require attribution at the very least. Attribution is the norm in scientific work so it seems like public domain/CC0 is not really the best match to established practice.<br>
<br>
I understand very well what you are doing with patents and you have been nothing but up front about it. I understand very well what patents are and how they work. I still think it’s a bad idea to propose using patents for scientific models. It’s also a pretty fringe idea. I often like fringe ideas but I don’t like this one.<br>
<br>
It is possible to get into trouble if you try to use code released under a GPL-style copyleft license with something proprietary. This is by design, it is not by accident or ignorance. If we want to discourage this (I don’t, personally) then we can recommend the more liberal MIT/BSD style of license.<br>
<br>
It is a very big problem when people release code with no license at all. That means we can’t do anything with it at all. I suggest that we drop the discussion about patents and simply say that it is important that model code is released under some license. The OSI maintains a decent list of appropriate licenses: <a href="https://opensource.org/licenses" rel="noreferrer" target="_blank">https://opensource.org/licenses</a><br>
<br>
Best wishes,<br>
-w<br>
<br>
> On 24 May 2021, at 19:06, Jacob Barhak <<a href="mailto:jacob.barhak@gmail.com" target="_blank">jacob.barhak@gmail.com</a>> wrote:<br>
> <br>
> Thanks William, <br>
> <br>
> A good debate is reasonable regarding licensing. So it is welcome.<br>
> <br>
> I can write a lot about it and in fact I have been having this conversation on several channels. <br>
> <br>
> There are many forms of restrictions on what you can do. Even open source licenses are despite their name are based on copyright law which is a form of legal restriction. Both copyright and patents are forms of legal restrictions. And if you want a comparison and a longer discussion, I suggest you look at the table the presentation I made for COMBINE last year:<br>
> • Jacob. Barhak, Open Source and Sustainability, COMBINE 2020 October 5-9. Video: <a href="https://drive.google.com/drive/folders/1actGnx6FwvoCcPrrF3qbnO0AmHt10WN6" rel="noreferrer" target="_blank">https://drive.google.com/drive/folders/1actGnx6FwvoCcPrrF3qbnO0AmHt10WN6</a> starting from minute 13:10. Presentation: <a href="https://jacob-barhak.github.io/COMBINE2020_OpenSource_upload_2020_10_04.odp" rel="noreferrer" target="_blank">https://jacob-barhak.github.io/COMBINE2020_OpenSource_upload_2020_10_04.odp</a><br>
> <br>
> Many people are unnecessarily worried about patents. I assume many times without understanding the details. I repeat again my conflict of interest, since I do hold patents. So I may be biased in your mind, yet please do check out my arguments in the presentation. <br>
> <br>
> Note that just like software licenses are not always compatible with each other, patents are not always compatible with some licenses and with intentions of all parties involved - this is many times the source for misunderstanding. Many restrictions are orthogonal to each other and need to be cleared before use.. In many cases, some work may need multiple licenses and permissions so you can use it. It depends on many factors, including jurisdictions, time, etc. <br>
> <br>
> Specifically for CC0 - CC0 is the most unrestricting license I am aware of since it waives copyright and therefore highly compatible with many others - this is why it was mentioned as a good solution and indeed it has been widely adopted . Moreover, it resolves issues of abandoned software or with software where multiple contributors cannot agree on. So it gives life to code and provides incentives to improve progress.<br>
> <br>
> If I am about to integrate a new model or a new work, I may be restricted by many restrictions, and those are coming from potentially multiple sources, especially if I am integrating multiple models. So eliminating copyright and making things compatible helps a lot. It may not be sufficient since there are still orthogonal restrictions, yet it's a good start. This is why it was recommended and indeed more and more entities are using CC0 to release work or to accumulate it in a repository. <br>
> <br>
> You mentioned CC licenses family - yes, those are nice licenses, yet some still hold restrictions and are not even compatible with each other. Here is the compatibility chart within CC license family:<br>
> <a href="https://wiki.creativecommons.org/wiki/Wiki/cc_license_compatibility" rel="noreferrer" target="_blank">https://wiki.creativecommons.org/wiki/Wiki/cc_license_compatibility</a><br>
> <br>
> And yes, in some cases for some entities some licenses will not match their intentions - it depends on the situation - yet if you have to bridge many intentions, it's a good idea to remove as many restrictions as possible.<br>
> <br>
> Hopefully you find these explanations sufficient for now. <br>
> <br>
> Jacob<br>
> <br>
> <br>
> <br>
> On Mon, May 24, 2021 at 8:19 AM William Waites <<a href="mailto:wwaites@ieee.org" target="_blank">wwaites@ieee.org</a>> wrote:<br>
> I am hesitant to get involved in this particular aspect of the paper and have long since timed out on software licensing discussions. However…<br>
> <br>
> The point that there are inconsistent licenses (or even absent licenses which is legally the most restricted since that defaults to “all rights reserved” essentially) and this can cause problems when assembling composite models is accurate and fair. This is a challenge that we need to address. We want to maximise the impact of the public funding of much of the kind of work that we do, which means that others need to be as free as possible to reuse our work.<br>
> <br>
> It is debatable whether CC0 is appropriate. It is meant to emulate the public domain in places that do not have a legal concept of public domain. It does not require attribution, which is the normal standard for academic work. The other CC licenses that require attribution are not designed for software. Insisting on using the public domain for software and then asserting the ability to control use using patents is a novel idea, but I don’t think it is a very good one. It is also not possible in many jurisdictions that do not allow software patents.<br>
> <br>
> Standards bodies also typically have patent policies which range from “disclose your patents” to “if you contribute patented stuff you must agree to never try to enforce it”. We can reasonably expect that if we produce patent-encumbered standards, nobody will use them. From a standards development point of view, this needs addressed as well.<br>
> <br>
> There is also a ton of well-developed literature on free and open source software licensing and compatibility among licenses.<br>
> <br>
> Best wishes,<br>
> -w<br>
> <br>
> <br>
<br>
</blockquote></div>
</blockquote></div>