[Vp-integration-subgroup] Another case study of data
Jacob Barhak
jacob.barhak at gmail.com
Fri Apr 2 14:56:06 PDT 2021
Thanks Tingting,
The messages you get for moderation is because the mailer system does not
accept images and the mailing lit mailer sends those to moderation -- I
approve those regularly, yet it is a bother so it is better to avoid using
images with this older mailing system. Also avoid using attachments - it is
ok to use links.
I was able to download the data from the first link you now sent when I
click the download button and click on csv. Please give me some time to
compare it with the data on the second link that I was able to download. I
will try to reproduce your findings by the time we speak.
Being able to download the data is the first step. So I am making progress.
I hope we can get to the bottom of this quickly.
Jacob
On Fri, Apr 2, 2021 at 4:15 PM Tingting Tang <ttang2 at sdsu.edu> wrote:
> Hi, Jacob,
>
> Not sure if the last message went through or not due to attachments.
>
> I have just redownloaded the data from ca open portal (
> https://data.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state/resource/1be1e43c-b4b2-4002-afb6-340bbcc85bbf)
> and link to the downloaded data in google sheet
>
> https://docs.google.com/spreadsheets/d/1WiPIloymqpe7QymsVP817f47MIr-_LFjc3r6tyPWOXE/edit?usp=sharing
>
> Also downloaded data from usa fact (
> https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/)
> cases data. They only have cumulative data, so daily cases number is
> computed.\
>
> Thanks,
> TIngting
>
>
>
> On Fri, Apr 2, 2021 at 12:29 PM Jacob Barhak <jacob.barhak at gmail.com>
> wrote:
>
>> Thanks Tingting,
>>
>> The resources in your first link cannot be downloaded:
>>
>> https://data.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state
>>
>> There is a server error I encounter. If you can send a direct link to the
>> spreadsheet data, it will help.
>>
>> I tried to verify your data and could not download the files. I was able
>> to download a csv file using this direct link.
>>
>> https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv?_ga=2.136905904.1065342744.1617373914-598050905.1617373914
>>
>> Unless I can download both files, I cannot verify the problem you
>> encountered. Perhaps the data curators figured out the issue and are fixing
>> it?
>>
>> Mistakes can happen, yet usually those are dealt with proper
>> announcements. I wonder if this is the case here.
>>
>> If the source of data is not responding when complaints are raised, this
>> indicates a problem. Hopefully it is just temporary. and things get
>> resolved quickly by the time we talk.
>>
>> Regardless, thank you for pointing out difficulties you had. It is
>> important more people realize the day to day difficulties modelers
>> encounter.
>>
>> Jacob
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Apr 2, 2021 at 1:53 PM Tingting Tang <ttang2 at sdsu.edu> wrote:
>>
>>> Hi, Jacob,
>>>
>>> For the three questions
>>>
>>>
>>> 1. Were those different infections / hospitalizations numbers?
>>>
>>> These are daily cases numbers from different websites.
>>> 2. Can you be specific and send the exact link to the data you used? I
>>> saw many links in your first link.
>>> For the CA data open portal data, the source website is
>>> https://data.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state
>>>
>>> For usa fact data is the
>>> https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/
>>>
>>>
>>> 3. Did you attempt to contact the sources of the data to figure out the
>>> reasons for discrepancies?
>>>
>>> I haven't contacted the source yet. As you mentioned, usa fact claims
>>> data from local government, but state possible discrepancy due to update
>>> frequency. My main concern is the level of the discrepancy is surprising.
>>> In addition, similar behaviour exists in other sources as well. In
>>> particular, at the local government website (
>>> https://www.icphd.org/health-information-and-resources/healthy-facts/covid-19/covid-19-data/)
>>> there is some discrepancy within itself, as data is being updated. I have
>>> tried to contact them, but haven't got anything back so far. We can chat
>>> more on that as well.
>>>
>>> Thanks,
>>> Tingting
>>>
>>> On Fri, Apr 2, 2021 at 7:40 AM Jacob Barhak <jacob.barhak at gmail.com>
>>> wrote:
>>>
>>>> Thanks Tingting,
>>>>
>>>> Your email is about data consistency in another location, not
>>>> necessarily about Singapore data - so I started another email thread.
>>>>
>>>> Just to clarify to the readers, you found 2 data sources with different
>>>> numbers.
>>>>
>>>> Let us examine the issue here and I have a few questions:
>>>>
>>>> 1. Were those different infections / hospitalizations numbers?
>>>>
>>>> 2. Can you be specific and send the exact link to the data you used? I
>>>> saw many links in your first link.
>>>>
>>>> 3. Did you attempt to contact the sources of the data to figure out the
>>>> reasons for discrepancies?
>>>>
>>>> The USA facts website states:
>>>> "they may not reflect the exact numbers reported state and local
>>>> government organizations"
>>>>
>>>> So perhaps you just stumbled on some data that will be fixed later.
>>>>
>>>> I am being cautious before jumping to conclusions. This has to be
>>>> studied in more detail to reach conclusions. However, I see your point that
>>>> the data consistency issue is confusing at the least.
>>>>
>>>> I will set up time to meet in private email.
>>>>
>>>> Thank you for drawing our attention to another case of potential data
>>>> issues.
>>>>
>>>> Jacob,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Apr 2, 2021 at 12:56 AM Tingting Tang <ttang2 at sdsu.edu> wrote:
>>>>
>>>>> Hi, Jacob,
>>>>>
>>>>> I create this figure using the data from the websites I mentioned.
>>>>> They are numbers of new cases per day reported by these websites. I also
>>>>> noticed that different websites sometimes have different meaning for "daily
>>>>> new cases" which makes the matter even more confusing. The following
>>>>> website contains this image
>>>>> https://www.notion.so/Two-websites-with-consistent-data-where-one-draw-from-the-other-2e54d94d9d474c36837cb48327963ba7
>>>>>
>>>>> I'd be happy to have a video chat sometime about the credibility of
>>>>> data.
>>>>>
>>>>> Thanks,
>>>>> Tingting
>>>>>
>>>>> On Thu, Apr 1, 2021 at 9:02 PM Jacob Barhak <jacob.barhak at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Tingting,
>>>>>>
>>>>>> Did you create those plots?
>>>>>>
>>>>>> It would be very interesting to start another discussion topic at the
>>>>>> credibility mailing list and see how many more people noticed differences
>>>>>> between data sources.
>>>>>>
>>>>>> However, the maling list will reject archiving images and large files
>>>>>> - its an old malign list tool we are using.
>>>>>>
>>>>>> Nevertheless, if you have a link to this image stored elsewhere
>>>>>> accessible like google drive, it would be nice to share your
>>>>>> experience with the working group.
>>>>>>
>>>>>> I was looking at your plot and data sources and was wondering if you
>>>>>> are showing hospitalisation data or diagnosed data?
>>>>>>
>>>>>> It seems that data needs interpretation - Lucas and I are working on
>>>>>> this aspect - if you are interested you can join the effort - I am
>>>>>> looking for experts to interpret data from a human perspective to add to
>>>>>> models. If this interests you, let me know and we will schedule a video
>>>>>> call so I can better explain.
>>>>>>
>>>>>> Meanwhile, thank you for your email and it will be nice if you share
>>>>>> this with the entire group.
>>>>>>
>>>>>> Jacob
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 1, 2021 at 6:53 PM Tingting Tang <ttang2 at sdsu.edu> wrote:
>>>>>>
>>>>>>> Hi, Jacob,
>>>>>>>
>>>>>>> This example prompts me to link the credibility of data sources of
>>>>>>> some websites I have been watching. In particular, I have been checking the
>>>>>>> covid tracking data for imperial county, ca, for over a month at different
>>>>>>> websites: local government (icphd.com), usa fact, 1point3acres.com,
>>>>>>> california open data portal(
>>>>>>> https://data.chhs.ca.gov/dataset/covid-19-hospital-data) etc.
>>>>>>>
>>>>>>> There seems to be quite a bit of inconsistency with these data
>>>>>>> sources in case reporting. A quick glance of the comparison between
>>>>>>> california open data portal and the usa fact data which claims they draw
>>>>>>> data from the prior is shown below. You can ignore the labels as they are
>>>>>>> signifying the loosen and tighten of the local government regulations.
>>>>>>>
>>>>>>> If you see fit I can provide more information to add this as another
>>>>>>> issue with data consistency and credibility as well.
>>>>>>>
>>>>>>>
>>>>>
>>>
>>> --
>>> Tingting Tang
>>> Assistant Professor
>>> San Diego State University Imperial Valley
>>> Office: FOBE 110
>>> Phone: 760-768-5531
>>> 720 Heber Ave
>>> Calexico, CA 92231
>>>
>>
>
> --
> Tingting Tang
> Assistant Professor
> San Diego State University Imperial Valley
> Office: FOBE 110
> Phone: 760-768-5531
> 720 Heber Ave
> Calexico, CA 92231
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.simtk.org/pipermail/vp-integration-subgroup/attachments/20210402/30153269/attachment.html>
More information about the Vp-integration-subgroup
mailing list