Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Data science to help move natural science out of the 'Stone Age'

Rebecca Merrett (CIO) | Aug. 12, 2015
Although some natural scientists do use statistical analysis when doing research, there's much more opportunity to apply new and advanced techniques for making discoveries.

Although some natural scientists do use statistical analysis when doing research, there's much more opportunity to apply new and advanced techniques for making discoveries.

This is what Professor Hugh Durrant-Whyte from the University of Sydney discussed at the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining in Sydney on Wednesday.

Durrant-Whyte, also former CEO of NICTA, said: "The science community is in the Stone Age. It would be progress if they moved on to using a database.

"There is an enormous amount that this [data science] community could do to simply help scientists access and visualise and use data.

"I've decided to make the rest of my career about using data science to progress science in general; to try and change the way they do discovery."

One challenge that natural science faces is the scarcity of data, as it's not always easy to physically collect samples or measurements from deep within the Earth, nor oceans or other almost out-of-reach places.

"Data is very expensive. In some geological cases, if you want to get a data point it can cost you $20 million for one point. You really have to work to get the data, so you have to be careful about how you use it.

"It's about small data and big models. It's not about big data, it's not about petabyte science, it's not about genomics and all these things you hear about, which is really just number crunching. It's about how you use relatively sparse data to build complex models," he said.

"Models also turn out to be very expensive to evaluate, as these are quite complex systems. If you are trying to simulate the Earth, taking one sample is actually quite an expensive thing to do. So you want to be a lot more data driven about the way that you use these models," he added.

Durrant-Whyte gave an example of a project he is working on where data is scarce - predicting the impact of fracking would have on water contamination in New South Wales.

"I can tell you that by the time you build a model the size of NSW and you take 2000 sample points, which are water bores, you do not have a lot of data in which to base your model."

As someone with a data science and machine learning background, he said there's an opportunity for him to find ways to build reliable models, to qualify uncertainty and help natural scientists move forward in their work.

Another challenge in natural science is the shortfalls in using non-linear differential equations to predict an outcome of an experiment, and then calibrate on those equations.

 

1  2  Next Page 

Sign up for MIS Asia eNewsletters.