<div dir="ltr"><div>Hi, Jacob,</div><div><br></div><div>I have just redownloaded the data from ca open portal (<a href="https://data.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state/resource/1be1e43c-b4b2-4002-afb6-340bbcc85bbf">https://data.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state/resource/1be1e43c-b4b2-4002-afb6-340bbcc85bbf</a>) and attached below,</div><div><br></div><div>Also downloaded data from usa fact (<a href="https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/">https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/</a>) cases data. They only have cumulative data, so daily cases number is computed. <br></div><div><br></div><div>The updated daily cases value from these two sources from data downloaded today is given below <br><img src="cid:ii_kn0rybxm0" alt="image.png" width="480" height="234"><br></div><div>Thanks,</div><div>Tingting<br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Apr 2, 2021 at 12:29 PM Jacob Barhak <<a href="mailto:jacob.barhak@gmail.com">jacob.barhak@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Thanks Tingting,<div><br></div><div>The resources in your first link cannot be downloaded:</div><div><a href="https://data.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state" target="_blank">https://data.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state</a><br></div><div><br></div><div>There is a server error I encounter. If you can send a direct link to the spreadsheet data, it will help. </div><div><br></div><div>I tried to verify your data and could not download the files. I was able to download a csv file using this direct link.</div><div><a href="https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv?_ga=2.136905904.1065342744.1617373914-598050905.1617373914" target="_blank">https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv?_ga=2.136905904.1065342744.1617373914-598050905.1617373914</a><br></div><div><br></div><div>Unless I can download both files, I cannot verify the problem you encountered. Perhaps the data curators figured out the issue and are fixing it?</div><div><br></div><div>Mistakes can happen, yet usually those are dealt with proper announcements. I wonder if this is the case here.</div><div><br></div><div>If the source of data is not responding when complaints are raised, this indicates a problem. Hopefully it is just temporary. and things get resolved quickly by the time we talk.</div><div><br></div><div>Regardless, thank you for pointing out difficulties you had. It is important more people realize the day to day difficulties modelers encounter. </div><div><br></div><div> Jacob</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Apr 2, 2021 at 1:53 PM Tingting Tang <<a href="mailto:ttang2@sdsu.edu" target="_blank">ttang2@sdsu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi, Jacob,</div><div><br></div><div>For the three questions</div><div><br></div><div>
<div><br></div><div>1. Were those different infections / hospitalizations numbers? </div><div><br></div><div>These are daily cases numbers from different websites.<br></div><div>2. Can you be specific and send the exact link to the data you used? I saw many links in your first link.</div><div>For the CA data open portal data, the source website is <a href="https://data.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state" target="_blank">https://data.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state</a></div><div><br></div><div>For usa fact data is the <a href="https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/" target="_blank">https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/</a></div><div><br></div><div><br></div><div>3. Did you attempt to contact the sources of the data to figure out the reasons for discrepancies?</div><div><br></div><div>I haven't contacted the source yet. As you mentioned, usa fact claims data from local government, but state possible discrepancy due to update frequency. My main concern is the level of the discrepancy is surprising. In addition, similar behaviour exists in other sources as well. In particular, at the local government website (<a href="https://www.icphd.org/health-information-and-resources/healthy-facts/covid-19/covid-19-data/" target="_blank">https://www.icphd.org/health-information-and-resources/healthy-facts/covid-19/covid-19-data/</a>) there is some discrepancy within itself, as data is being updated. I have tried to contact them, but haven't got anything back so far. We can chat more on that as well.</div><div><br></div><div>Thanks,</div><div>Tingting<br></div>
</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Apr 2, 2021 at 7:40 AM Jacob Barhak <<a href="mailto:jacob.barhak@gmail.com" target="_blank">jacob.barhak@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Thanks Tingting,</div><div dir="ltr"><br></div><div>Your email is about data consistency in another location, not necessarily about Singapore data - so I started another email thread. </div><div><br></div><div>Just to clarify to the readers, you found 2 data sources with different numbers.</div><div><br></div><div>Let us examine the issue here and I have a few questions:</div><div><br></div><div>1. Were those different infections / hospitalizations numbers? </div><div><br></div><div>2. Can you be specific and send the exact link to the data you used? I saw many links in your first link.</div><div><br></div><div>3. Did you attempt to contact the sources of the data to figure out the reasons for discrepancies?</div><div><br></div><div>The USA facts website states:</div><div>"they may not reflect the exact numbers reported state and local government organizations"</div><div><br></div><div>So perhaps you just stumbled on some data that will be fixed later.</div><div><br></div><div>I am being cautious before jumping to conclusions. This has to be studied in more detail to reach conclusions. However, I see your point that the data consistency issue is confusing at the least. </div><div><br></div><div>I will set up time to meet in private email.</div><div><br></div><div>Thank you for drawing our attention to another case of potential data issues.</div><div><br></div><div> Jacob,</div><div><br></div><div><br></div><div><br></div><div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Apr 2, 2021 at 12:56 AM Tingting Tang <<a href="mailto:ttang2@sdsu.edu" target="_blank">ttang2@sdsu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi, Jacob,</div><div><br></div><div>I create this figure using the data from the websites I mentioned. They are numbers of new cases per day reported by these websites. I also noticed that different websites sometimes have different meaning for "daily new cases" which makes the matter even more confusing. The following website contains this image <a href="https://www.notion.so/Two-websites-with-consistent-data-where-one-draw-from-the-other-2e54d94d9d474c36837cb48327963ba7" target="_blank">https://www.notion.so/Two-websites-with-consistent-data-where-one-draw-from-the-other-2e54d94d9d474c36837cb48327963ba7</a></div><div><br></div><div>I'd be happy to have a video chat sometime about the credibility of data.</div><div><br></div><div>Thanks,</div><div>Tingting</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 1, 2021 at 9:02 PM Jacob Barhak <<a href="mailto:jacob.barhak@gmail.com" target="_blank">jacob.barhak@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Tingting,<div><br></div><div>Did you create those plots? </div><div><br></div><div>It would be very interesting to start another discussion topic at the credibility mailing list and see how many more people noticed differences between data sources.</div><div><br></div><div>However, the maling list will reject archiving images and large files - its an old malign list tool we are using.</div><div><br></div><div>Nevertheless, if you have a link to this image stored elsewhere accessible like google drive, it would be nice to share your experience with the working group. </div><div><br></div><div>I was looking at your plot and data sources and was wondering if you are showing hospitalisation data or diagnosed data?</div><div><br></div><div>It seems that data needs interpretation - Lucas and I are working on this aspect - if you are interested you can join the effort - I am looking for experts to interpret data from a human perspective to add to models. If this interests you, let me know and we will schedule a video call so I can better explain.</div><div><br></div><div>Meanwhile, thank you for your email and it will be nice if you share this with the entire group.</div><div><br></div><div> Jacob</div><div><br></div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 1, 2021 at 6:53 PM Tingting Tang <<a href="mailto:ttang2@sdsu.edu" target="_blank">ttang2@sdsu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi, Jacob,</div><div><br></div><div>This example prompts me to link the credibility of data sources of some websites I have been watching. In particular, I have been checking the covid tracking data for imperial county, ca, for over a month at different websites: local government (<a href="http://icphd.com" target="_blank">icphd.com</a>), usa fact, <a href="http://1point3acres.com" target="_blank">1point3acres.com</a>, california open data portal(<a href="https://data.chhs.ca.gov/dataset/covid-19-hospital-data" target="_blank">https://data.chhs.ca.gov/dataset/covid-19-hospital-data</a>) etc. <br></div><div><br></div><div>There seems to be quite a bit of inconsistency with these data sources in case reporting. A quick glance of the comparison between california open data portal and the usa fact data which claims they draw data from the prior is shown below. You can ignore the labels as they are signifying the loosen and tighten of the local government regulations. <br></div><div><br></div><div>If you see fit I can provide more information to add this as another issue with data consistency and credibility as well. <br></div><div><br></div><div></div></div></blockquote></div></blockquote></div><br>
</blockquote></div></div>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>Tingting Tang</div><div>Assistant Professor</div><div>San Diego State University Imperial Valley</div><div><div>
Office: FOBE 110 <br></div><div>Phone: 760-768-5531</div><div>720 Heber Ave</div>Calexico, CA 92231</div></div></div></div></div></div></div>
</blockquote></div>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>Tingting Tang</div><div>Assistant Professor</div><div>San Diego State University Imperial Valley</div><div><div>
Office: FOBE 110 <br></div><div>Phone: 760-768-5531</div><div>720 Heber Ave</div>Calexico, CA 92231</div></div></div></div></div></div></div>