Please log in to watch this conference skillscast.
The speed with which COVID-19 has taken over the world has raised the demand for data-
driven health decisions and the shift towards virtual may actually enable the necessary data
collection. This session talks about how CSIRO has leveraged cloud-native technologies to
advance three areas of the COVID-19 response: firstly we worked with GISAID, the largest
data resource for the virus causing COVID-19 and use standard health terminologies (FHIR)
to help collect clinical patient data. This feeds into a Docker-based workflow that creates
identifying “fingerprints” of the virus for guiding vaccine developments and investigating
whether there are more pathogenic versions of the virus. Secondly, we developed a fully
serverless web-service for tailoring diagnostics efforts, capable of differentiating between
strains. Thirdly, we are creating a serverless COVID-19 analysis platform that allows
distributed genomics and patient data to be shared and analysed in a privacy- and
ownership-preserving manner and functioning as a surveillance system for detecting more
virulent strains early.
Question: Where is the most value a citizen data scientist can add in the fight against covid?
Answer: There are so many tasks to do that certainly everyone is appreciated who wants to chip in. I think the biggest value is in collecting and combining data sources. There are a lot of databases that could be tapped into but the conversion between one format to another can be painstakingly slow. So having volunteers combine these would be a huge help.
Question: Is there any way to train the system to recognise all or most of the different ways to say (for example) "lost sense of smell" - or are there too many / takes too long?
Answer: Yes this is basically what NLP and ontologies do. There is a huge body of work which we can thankfully tap into but if this data is collected better from the get go there is less room for error. So the magic lies in designing software that makes it as seamless to record the data correctly as it would be by just typing in free text.
Question: Are those easy-to-use Folding at home COVID-19 support helping?
Answer: I am not a protein structure expert so I don't know the extent of how it is used and whether this could have been running on a HPC in 10 minutes. I know vodafone came to researchers with the brief of coming up with something to use the mobile CPUs that are sitting idle overnight, so we came up with something - back then it was not very useful, but this might have changed since. Where it helps though is with the awareness and the general idea of wanting to support science. If the folding-at-home evolves to you wanting to write a small python script to combine two datasets then it has helped in my books
Question: It seems interpretable ML models are useful in this context, but a majority of tools are not. When is interpretability necessary for researchers?
Answer: It probably is not necessary for researchers because people using ML are hopefully trained to a level to choose the right algorithm for the right job, i.e. is accuracy important or do you want to find out something about the data. Typically, we work with clinicians or biologists and unless we have a mechanistic story (gene A works in this pathway to cause this molecular change), we don't get buy-in. So for establishing trust that the tools we build gain real insights, explainable ML is crucial.
Question: Do you see a data trust (independent & fiduciary steward of data) popping up in Australia connecting healthcare providers, insurers, cloud & platform vendors, researchers and citizens ?
Answer: That is technically what MyHealth wants to achieve. It will be a while, there are more than just technical problems to work through though
Question: Are there mechanisms set up in your team to experiment with more relevant/new ML techniques or cloud tech to apply to genomics? If so, how do you keep up with that, on top of the research work you are doing?
Answer: We are on the boundary of applied/algorithmic ML, meaning that we are super interested in new ML if there is a strong reason we believe it to be better. I.e. we dappled with DeepLearning but found that the hyperparameter optimization space is too large to get good results practically: for our application space RF or LR outperforms them. But I guess once the systematic hyperparameter search in the cloud becomes cheaper this might change. So in short: we are always interested in new algs if there is a compelling argument why it should be better. As to keeping up, yes that is hard especially with such a large volume of news/papers about fundamentally new algorithms being just marketing.
YOU MAY ALSO LIKE: