5 Ways Data Scientists Can Help Respond to COVID-19 and 5 Actions to Avoid

By Robert Munro, Author, Human-in-the-Loop Machine Learning.

There are many Data Scientists thinking about how they can help respond to the SARS-CoV-2 virus and the disease it causes “COVID-19”.

This article is written in response to this current disaster but is intended as general advice for data scientists who want to help with disaster response.

I worked in post-conflict development for the UN in West Africa before coming to Silicon Valley to complete a Ph.


focused on adapting Machine Learning to low resource languages in health and disaster response contexts.

Ive helped respond to many disasters worldwide, including the recent Ebola outbreak in West Africa, the MERS-coronavirus outbreak 10 years ago, and as CTO of a global epidemic tracking organization.

However, I think I have made the biggest impact by helping large tech companies support more languages, not in actual disaster response.

If you dont speak a privileged language like English, then you are more likely to be the victim of a disaster and to have less information available to make the right decisions about your own recovery.

So, ensuring better language coverage is vital.

When I led AWSs first Natural Language Processing (NLP) and Machine Translation solutions, and when I had the two largest phone manufacturers as customers for NLP and Speech Recognition data, I used my influence to ensure more diverse language support within those companies.

While it is harder to quantify, I think that this might have ultimately done more to help people in disasters than all my time as a disaster responder.

So, if you are a data scientist working at a company that makes widely used technology, the best thing that you can do might be to ensure that there is more diverse language support for your language technologies.

This will continue to help in future disasters.

Pathogens, like most organizations, cluster with linguistic diversity.

Source: “Artificial Intelligence for Social Good,” Robert Munro, Stanford lecture in From Languages to Information, 2015.

Content moderation is also very important in disasters.

Criminals prey on disaster victims, especially elderly people in financial scams, and by targeting children for abuse.

If your company has content moderation systems that track and report potential financial scams and abuse of minors, then this is very important work.

If you dont think that you can help with language diversity or by tracking scams/abuse at your company, and you still want to contribute to the response to SARS-CoV-2, then here are 5 ways that you can help:Unfortunately, there are many actions where it is likely that you will do more harm than good.

More than 90% of your ideas as a data scientist dont actually work out when put into practice, and you should not expect a better success rate in disaster response, especially when you have no experience in the field.

So the remaining 5 ways that you can help are 5 things to avoid:  As a data scientist you objectively evaluate information regularly and probably have a well-tuned sense of what true scientific reporting looks like in healthcare even though you might not have experience in the field.

A lot of your family and other people around you probably have less experience than you.

Now is the time to teach them how to interpret log scales on graphs and why they should be suspicious of any graph without a scale.

Misleading image that is being shared on social media at the moment.

This is a good example of the kind of misleading information that is being shared on social media at the moment.

Beyond the need to interpret log scales, here are some additional things to note in this example graph:Combatting misinformation like this spreading on the internet might be the most important thing that you can do as a data scientist.

If someone you are close to comes to you with information like this, point out all these issues to them and then ask them why someone who does know the truth might be trying to mislead them.

An obviously fake graphic like this could lead someone to distrust masks.

That would also be wrong.

This should have no influence on someones decision, and the only advice should be:Only take the advice of your trusted healthcare providers.

You might also want to talk about why healthcare organizations arent talking about issues like this, which is because they dont see data that tells them it is important at this moment.

If people share this too much it can become political and force organizations like the CDC and WHO into a response that is politically driven instead of health-driven.

So, you should caution people about sharing this type of information regardless of whether they agree.

  Do you speak a language outside of English? Especially a less widely spoken language? Theres a good chance that a lot of valuable information is not being translated into those languages, or worse, that a lot of misinformation is spreading without the contradicting of correct information available.

Any relevant data that is translated and/or transcribed in a way that can be used by Machine Translation and Speech Recognition models will be useful.

For example, two years ago, I led a project to create 10+ hours of disaster and health-related recordings of informational messages from the Red Cross in Swahili, with transcriptions and English translations.

This data was made open source, and every Machine Translation and Speech Recognition service that uses this data is now more accurate for communications related to COVID-19.

If you can create similar datasets and open-source them, that will help COVID-19 and any future response in those languages.

If you dont have any existing datasets, then I recommend helping an organization like Translators Without Borders.

They were one of the organizations that helped with the Swahili dataset above and who work closely with organizations responding to disasters.

If you are not a professional translator, then dont translate advice about preventing or treating COVID-19.

Instructional material and medical terminology are among the hardest types of translations to get right.

I ran the largest use of crowdsourcing for translation in a disaster, so please take my advice on this one point.

  Epidemiologists are data scientists, and like the rest of us, they spend most of their time preparing data.

If you are able to take data that might be directly related to the response and transform it into a more usable format, then you can help with the response directly.

One example of this might be taking a dataset of anonymized transportation routes that contains ambiguous or non-standard location names and transforming those locations into unambiguous geo-locations.

Another example would be making past research papers about coronaviruses more easily searchable so that virologists can come up-to-speed on the past research as efficiently as possible.

Epidemiologists typically come from the social sciences, so expect them to be more rigorous when it comes to the right statistical analysis of the data, compared to a machine learning-focused data scientist.

  If you are not an epidemiologist, virologist, or another scientist with a lot of experience responding to disasters, then you are not going to be able to get up-to-speedd on an entire field in only a few months.

Most of the interventions that you could do would end up hurting people instead of helping them (see below).

However, you can analyze data that tells us something important about the outbreak but doesnt have a direct relationship to the response itself.

There are many ways that peoples behaviors are changing as a result of COVID-19.

Most disaster response professionals will focus on the direct response and might not get back to other relevant data later.

For example, when there was an outbreak of Ebola in West Africa a few years ago, I was advising many organizations because I had lived and worked in Sierra Leone and Liberia in addition to my more general disaster response experience.

One thing I calculated that wasnt directly related to the outbreak was to estimate the number of people who died from causes other than Ebola because they were avoiding healthcare clinics.

I calculated that for every person who died from Ebola, ten more died from treatable illnesses: The silent victims of EbolaThis helped the response indirectly because we used it to reduce the number of misleading news stories in the countries.

Too many media outlets had decided to run information campaigns across the region without considering anything other than reducing Ebola deaths.

So, I was able to provide this analysis to international health organizations who used it to help keep the media on-message as much as possible.

For COVID-19, what data can you find and analyze about human behavior that might help indirectly? For example, can you see how the reduction in driving and, therefore, car accidents might free up more hospital beds? The chances are that this might be an important number but, no-one has looked into this on a national scale.

Similarly, how many fewer deaths are there due to less pollution? Or how what is the net benefit on the global carbon footprint now that we can actually measure the result of reduced pollution? Climate change will ultimately kill more people than COVID-19, and this might be one of our best chances to get accurate data about global changes in human behavior.

There is a lot that data scientists can teach us right now without the risk of contributing directly to the response, and they might ultimately have a greater impact on the world.

  If you really want to focus on disaster response, then many datasets are relevant to disaster response, and any insight into those past datasets will help us build models today for COVID-19 and other disasters in the future.

One NLP dataset that I helped create contains 30,000 messages drawn from events including an earthquake in Haiti in 2010, an earthquake in Chile in 2010, floods in Pakistan in 2010, and super-storm Sandy in the U.



in 2012.

These are all disasters that I helped respond to, and this dataset also includes news articles spanning a large number of years and 100s of other disasters: Access to dataImportantly, some of this data is in languages other than English.

For example, the Haitian Kreyol data was used as a shared task in the 2011 Workshop on Machine Translation.

This dataset is also used in classes run by AI4All, Udacity, and universities, including Stanford.

The more people who have experience in disaster-related data, the more prepared we can be in the future.

If you work in computer vision, then I recommend researching systems that act to support healthcare professionals in their interpretations of images.

Healthcare companies will get little or no value from a computer vision system that can only detect one type of infection and only provides a prediction, rather than an interface to help a healthcare professional with their own diagnosis.

Avoid research which is popular in academic circles only because the data is easy to collect or the problem is easy to model.

These include English-only social media analysis in NLP and automated diagnosis for single conditions in medical images in Computer Vision.

Results from these kinds of studies dont help us decide what approach will help us in actual disasters.

  If you are not a healthcare worker or disaster response expert, then you should not give your medical opinion about how people should protect themselves.

Despite working in disaster response for a decade, I am only directing people to more authoritative sources.

You will not see me give you advice about how to protect or treat yourself in this article or on social media.

Please do the same.

Furthermore, if you are quoting expert individuals or organizations, it is better to point people to those sources than to copy them to your website.

Unless you are prepared to constantly monitor the healthcare experts for any changes in their advice and immediately update your material to reflect the latest advice, you will be printing misinformation at some point and creating confusion about who the authoritative source should be.

Resist any urge to take part in the discussion.

There is absolutely no way that you can learn enough information to be useful within a short amount of time.

For example, think about what would happen if someone read the most popular machine learning research papers of the last few years, but had no other experience.

Would that prepare them to ship useful machine learning models for real applications? Absolutely not.

Those papers have nothing about making machine learning work in the real world, and we know that for every paper, there were 100s or 1000s of experiments that showed negative results.

The same is true for any of the sciences directly responding to a disaster, whether it is about epidemiology, virology, or equipment like face-masks.

Reading the 100 most relevant papers will not let you make a useful contribution.

You will be biased by the particular problems that make it into papers about early research and the bias to only publish positive results.

You will likely get people killed.

  Most organizations that are reaching out to data scientists for help with COVID-19 are not directly helping with the response to COVID-19.

To give a very high-level introduction to the aid industry, heres a graphic showing how a lot of aid organization work in disaster response:High-level overview of how aid organizations are structured.

A small number of large organizations that do aid at the national or international level are known as “Operational Organizations,” but most of them use local “Implementing Partners” for the actual disaster response work.

Some local aid organizations might be wholly independent or joint independent and helping larger orgs.

“Non-Operational Organizations” are the smallest but can erroneously look like they are big and operational.

If someone is asking you to help, then how do you know if they are actually responding? The best organization to help is one operating locally.

Does your local hospital or food distribution center for refugees need help? Start with them.

You can work with big organizations like the CDC and WHO, but this is the worst time to start trying to get the attention of the big organizations as any time spent bringing you up-to-speed is time they are not responding to the outbreak.

In any case, most of these large organizations would be directing you to a local implementing partner.

The non-operational organizations are typically small and use disasters as funding and publicity opportunities.

Look for them talking about “partnerships” with bigger organizations like the WHO, but nowhere saying that they are an “implementing partner.

” This is typically code for “not actually part of the response.

” If they reach out to you, then the chances are that you are the product, and they are telling potential funders “look, we have volunteer data scientists from company X, and we will beat COVID with innovation.

“As a rule of thumb, if its not a national organization that you already know about (like the CDC or equivalent in your country), and its not one of the first 30 UN Agencies in their list of Funds, Programmes, Specialized Agencies and Others, then look for organizations that you know are operating in your local area.

  Ive never had trouble recruiting people at the start of a disaster, but Ive always had trouble recruiting people who can help for a meaningful length of time.

If youre writing code, building models, or writing documentation now, can you ensure that you will be able to support that in 3 months or 6 months?Keep in mind that you might get ill yourself or have to look after others.

If you or someone you are a caregiver for is more likely to get a worse case of COVID-19, then you should not be putting yourself on the critical path for a response if you are not already an essential worker.

Furthermore, you need to be highly empathetic, but dispassionate, to be an effective disaster responder.

If you are caught up in worry about yourself and your loved ones, then you are probably coming from a place of personal passion and will have trouble acting with objective empathy.

I cant trust anyone in that situation, and so I always put people like this onto non-critical tasks in disasters.

  There are a number of fake media narratives that appear in every disaster.

The most destructive are ones that target the response organizations for not doing the right thing.

Even popular media outlets do this: they find one small part of the response where one organization has not recently done any work or where there are policies that disagree with other organizations.

No matter how small the problem might be, it is easy for a media organization to present this as “potentially endangering millions of lives” and to get people on both sides of the argument to comment.

Essentially, they invent controversy when none should exist.

The favorite targets for quotes on are politicians that are not in power in a country, because those politicians will blame the party in power, and technology mavericks in areas like data science because that is where confidence often out-paces competence.

Journalists know that these kinds of articles are grossly unethical, and they avoid putting their name to it.

So, look for news articles that dont have an author or are invited authors from data science or opposition political parties.

The worst part of this narrative is the messages like “dont trust the WHO” or “dont trust the CDC,” etc.

Even if this issue being argued about is correct, the broader story about distrusting these organizations will do more harm than addressing this one issue.

  Most of the worlds governments will have at least some people in those governments now trying to implement measures to take away your civil liberties.

Specific to coronaviruses, I spoke about this at KDD last year, sharing how the company I ran during the MERS Coronavirus outbreak decided not to help with social media analysis because of the implications for peoples privacy:Starting from 1:13 (https://www.



be&t=4380), at KDD in 2019, I talked about how the Saudi Arabian Government used a coronavirus outbreak (MERS-CoV) as the pretext to identify dissidents on social media.

The same will be true for many criminals.

While crime tends to go down during disasters, because people are overwhelmingly good, there are some people who thrive in the chaos to exploit people.

So, related to the very first point in this article, look out for your family.

Elderly people are specifically targeted during disasters in scams to take their money by releasing their personal identities.

Children are often targeted by sexual predators, and so be especially careful about any data source, even if it appears open.

For example, National Geographic published the phone numbers of children in Haiti following the 2010 earthquake there.

In general, there should be no need to make any data public, and you should be careful about even reporting exactly how you are responding to the disaster.

If validation for your contribution is important, then I recommend getting that privately or after the response is over.

  If there is nothing that you can do to help right now, then I recommend longer-term actions:Thank you for helping with the response!Original.

Reposted with permission.

 Bio: Robert Munro worked in refugee camps for the UN in West Africa before his PhD at Stanford that focused on machine learning in health and disaster response.

He helped respond to the recent Ebola outbreak in West Africa, the MERS-coronavirus outbreak 10 years ago, and was CTO of a global epidemic tracking organization.

Robert also ran AWSs first NLP service, Amazon Comprehend, and has worked as a leader in many Silicon Valley technology companies.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.



js; (document.

getElementsByTagName(head)[0] || document.


appendChild(dsq); })();.

Leave a Reply