[Vp-reproduce-subgroup] Another case study of data

Tingting Tang ttang2 at sdsu.edu
Fri Apr 2 14:14:54 PDT 2021


Hi, Jacob,

Not sure if the last message went through or not due to attachments.

I have just redownloaded the data from ca open portal (
https://data.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state/resource/1be1e43c-b4b2-4002-afb6-340bbcc85bbf)
and link to the downloaded data in google sheet
https://docs.google.com/spreadsheets/d/1WiPIloymqpe7QymsVP817f47MIr-_LFjc3r6tyPWOXE/edit?usp=sharing

Also downloaded data from usa fact (
https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/) cases
data. They only have cumulative data, so daily cases number is computed.\

Thanks,
TIngting



On Fri, Apr 2, 2021 at 12:29 PM Jacob Barhak <jacob.barhak at gmail.com> wrote:

> Thanks Tingting,
>
> The resources in your first link cannot be downloaded:
>
> https://data.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state
>
> There is a server error I encounter. If you can send a direct link to the
> spreadsheet data, it will help.
>
> I tried to verify your data and could not download the files. I was able
> to download a csv file using this direct link.
>
> https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv?_ga=2.136905904.1065342744.1617373914-598050905.1617373914
>
> Unless I can download both files, I cannot verify the problem you
> encountered. Perhaps the data curators figured out the issue and are fixing
> it?
>
> Mistakes can happen, yet usually those are dealt with proper
> announcements. I wonder if this is the case here.
>
> If the source of data is not responding when complaints are raised, this
> indicates a problem. Hopefully it is just temporary. and things get
> resolved quickly by the time we talk.
>
> Regardless, thank you for pointing out difficulties you had. It is
> important more people realize the day to day difficulties modelers
> encounter.
>
>               Jacob
>
>
>
>
>
>
>
> On Fri, Apr 2, 2021 at 1:53 PM Tingting Tang <ttang2 at sdsu.edu> wrote:
>
>> Hi, Jacob,
>>
>> For the three questions
>>
>>
>> 1. Were those different infections / hospitalizations numbers?
>>
>> These are daily cases numbers from different websites.
>> 2. Can you be specific and send the exact link to the data you used? I
>> saw many links in your first link.
>> For the CA data open portal data, the source website is
>> https://data.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state
>>
>> For usa fact data is the
>> https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/
>>
>>
>> 3. Did you attempt to contact the sources of the data to figure out the
>> reasons for discrepancies?
>>
>> I haven't contacted the source yet. As you mentioned, usa fact claims
>> data from local government, but state possible discrepancy due to update
>> frequency. My main concern is the level of the discrepancy is surprising.
>> In addition, similar behaviour exists in other sources as well. In
>> particular, at the local government website (
>> https://www.icphd.org/health-information-and-resources/healthy-facts/covid-19/covid-19-data/)
>> there is some discrepancy within itself, as data is being updated. I have
>> tried to contact them, but haven't got anything back so far. We can chat
>> more on that as well.
>>
>> Thanks,
>> Tingting
>>
>> On Fri, Apr 2, 2021 at 7:40 AM Jacob Barhak <jacob.barhak at gmail.com>
>> wrote:
>>
>>> Thanks Tingting,
>>>
>>> Your email is about data consistency in another location, not
>>> necessarily about Singapore data - so I started another email thread.
>>>
>>> Just to clarify to the readers, you found 2 data sources with different
>>> numbers.
>>>
>>> Let us examine the issue here and I have a few questions:
>>>
>>> 1. Were those different infections / hospitalizations numbers?
>>>
>>> 2. Can you be specific and send the exact link to the data you used? I
>>> saw many links in your first link.
>>>
>>> 3. Did you attempt to contact the sources of the data to figure out the
>>> reasons for discrepancies?
>>>
>>> The USA facts website states:
>>> "they may not reflect the exact numbers reported state and local
>>> government organizations"
>>>
>>> So perhaps you just stumbled on some data that will be fixed later.
>>>
>>> I am being cautious before jumping to conclusions. This has to be
>>> studied in more detail to reach conclusions. However, I see your point that
>>> the data consistency issue is confusing at the least.
>>>
>>> I will set up time to meet in private email.
>>>
>>> Thank you for drawing our attention to another case of potential data
>>> issues.
>>>
>>>            Jacob,
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Apr 2, 2021 at 12:56 AM Tingting Tang <ttang2 at sdsu.edu> wrote:
>>>
>>>> Hi, Jacob,
>>>>
>>>> I create this figure using the data from the websites I mentioned. They
>>>> are numbers of new cases per day reported by these websites. I also noticed
>>>> that different websites sometimes have different meaning for "daily new
>>>> cases" which makes the matter even more confusing. The following website
>>>> contains this image
>>>> https://www.notion.so/Two-websites-with-consistent-data-where-one-draw-from-the-other-2e54d94d9d474c36837cb48327963ba7
>>>>
>>>> I'd be happy to have a video chat sometime about the credibility of
>>>> data.
>>>>
>>>> Thanks,
>>>> Tingting
>>>>
>>>> On Thu, Apr 1, 2021 at 9:02 PM Jacob Barhak <jacob.barhak at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Tingting,
>>>>>
>>>>> Did you create those plots?
>>>>>
>>>>> It would be very interesting to start another discussion topic at the
>>>>> credibility mailing list and see how many more people noticed differences
>>>>> between data sources.
>>>>>
>>>>> However, the maling list will reject archiving images and large files
>>>>> - its an old malign list tool we are using.
>>>>>
>>>>> Nevertheless, if you have a link to this image stored elsewhere
>>>>> accessible like google drive, it would be nice to share your
>>>>> experience with the working group.
>>>>>
>>>>> I was looking at your plot and data sources and was wondering if you
>>>>> are showing hospitalisation data or diagnosed data?
>>>>>
>>>>> It seems that data needs interpretation - Lucas and I are working on
>>>>> this aspect - if you are interested you can join the effort - I am
>>>>> looking for experts to interpret data from a human perspective to add to
>>>>> models. If this interests you, let me know and we will schedule a video
>>>>> call so I can better explain.
>>>>>
>>>>> Meanwhile, thank you for your email and it will be nice if you share
>>>>> this with the entire group.
>>>>>
>>>>>              Jacob
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 1, 2021 at 6:53 PM Tingting Tang <ttang2 at sdsu.edu> wrote:
>>>>>
>>>>>> Hi, Jacob,
>>>>>>
>>>>>> This example prompts me to link the credibility of data sources of
>>>>>> some websites I have been watching. In particular, I have been checking the
>>>>>> covid tracking data for imperial county, ca, for over a month at different
>>>>>> websites: local government (icphd.com), usa fact, 1point3acres.com,
>>>>>> california open data portal(
>>>>>> https://data.chhs.ca.gov/dataset/covid-19-hospital-data) etc.
>>>>>>
>>>>>> There seems to be quite a bit of inconsistency with these data
>>>>>> sources in case reporting. A quick glance of the comparison between
>>>>>> california open data portal and the usa fact data which claims they draw
>>>>>> data from the prior is shown below. You can ignore the labels as they are
>>>>>> signifying the loosen and tighten of the local government regulations.
>>>>>>
>>>>>> If you see fit I can provide more information to add this as another
>>>>>> issue with data consistency and credibility as well.
>>>>>>
>>>>>>
>>>>
>>
>> --
>> Tingting Tang
>> Assistant Professor
>> San Diego State University Imperial Valley
>> Office: FOBE 110
>> Phone: 760-768-5531
>> 720 Heber Ave
>> Calexico, CA 92231
>>
>

-- 
Tingting Tang
Assistant Professor
San Diego State University Imperial Valley
Office: FOBE 110
Phone: 760-768-5531
720 Heber Ave
Calexico, CA 92231
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.simtk.org/pipermail/vp-reproduce-subgroup/attachments/20210402/27190a90/attachment-0001.html>


More information about the Vp-reproduce-subgroup mailing list