[Vp-integration-subgroup] Another case study of model integration

William Waites wwaites at ieee.org
Tue Mar 30 12:29:54 PDT 2021


Jacob, I understand. There is a good overview of the situation here: https://opendefinition.org/guide/data/

I think there are two main things arising here:

1. We need a way to keep track of this stuff which is partly about making sure all of these issues about data annotation and provenance are addressed, and partly a practical matter — having a collection of datasets to use for calibration and fitting of all the different things is useful for the modelling effort, and bonus if we can get at the data in something like a uniform way.

2. This specific model and data that you’re talking about, do you need that specific data or can we find other data that would do equally well? What is it that you need, precisely?

For #1 we should not reinvent the wheel, that ground’s been covered in detail. That’s where the guide linked above and the CKAN software underneath https://data.gov/ and similar catalogues come from. Do we need to stand up an instance of CKAN somewhere for the group and wider community? This is a case where a little bit of centralisation helps with consistency.

Cheers,
-w

> On 30 Mar 2021, at 20:15, Jacob Barhak <jacob.barhak at gmail.com> wrote:
> 
> Yes William,
> 
> The issue here is traceability and accountability.
> 
> You see, if the data originator is not providing any information on data collection methods and usage terms, how do you know the data can be used at all - even for academic purposes?
> 
> You see, I know that at least in the USA, each academic institution is signed on all sorts or regulations about use of human subject data.  
> 
> If there is no corresponding person and no address that can be reached, how would you grade the data quality or even data legality?
> 
> It is not only models and software that need licenses and metadata to describe them - a proper data source should be well annotated to be credible.
> 
> In the software world there is a legal term called indemnification when you pass software to another entity - how do you indemnify data? 
> 
> This is why we are raising this in the working group - it is important people be aware of those aspects - it is not only a legal topic - it is much more. Yet I hope we can somehow contact the data curators and clear things.
> 
>                 Jacob
> 
> 
> 
> 
> 
> On Tue, Mar 30, 2021 at 1:47 PM William Waites <wwaites at ieee.org> wrote:
> > 2. Determine that Recovery / incubation models cannot be reused currently since the data source that made the data available is not responding and did not specify usage terms. I asked assistance from this mailing list to contact the entity responsible for the data in this message: https://lists.simtk.org/pipermail/vp-integration-subgroup/2021-March/000043.html  
> > If you can help, please respond.
> 
> It doesn’t look like Upcode Academy is actually the source of the data — it’s some guy running a website and an on-line software development business, who also wants to sell you a dashboard for "your country". I think he must be getting the data from elsewhere.
> 
> What is special about this particular data source? Can you describe it specifically?
> 
> (Generic point: for each data source, describe it properly, annotate it with metadata… Do we need a data catalogue? This will come up repeatedly.)
> 
> Note that for academic work, though fair use rules vary throughout the world and we might not have to worry so much about copyright, we do typically have to worry about ethics, privacy, consent, etc. This is data about people after all. Depending on what’s in it, that may need to be considered.
> 
> Cheers,
> -w
> 
> 



More information about the Vp-integration-subgroup mailing list