FDL Europe 2020 - Digital Twin Earth

Machine learning to make rain forecasting more accurate

Published 7 JUL 2022

 

Frontier Development Lab

Frontier Development Lab (FDL) is a public-private partnership with ESA in Europe and NASA in the USA. FDL works with commercial partners to apply AI technologies to space science, to push the frontiers of research and develop new tools to help solve some of the biggest challenges that humanity faces. These range from the effects of climate change to predicting space weather, from improving disaster response, to identifying meteorites that could hold the key to the history of our universe.

FDL Europe 2020 was a research sprint hosted by the University of Oxford, that took place over a period of eight weeks in order to promote rapid learning and research outcomes in a collaborative atmosphere.

Digital Twin Earth

Can we lower the cost of accurate global precipitation forecasts?

The Digital Twin Earth (DTE) project set out to discover whether machine learning can learn forecast precipitation by fusing simulated satellite weather data with physical model data, to offer a low-cost alternative to expensive simulation infrastructure.

Weather forecasting systems haven’t fundamentally changed for over 50 years, and are based on the fluid dynamical flow of the atmosphere using as much physics-based and observational data - such as temperature, velocity and pressure - as they can computationally afford. Given the amount of these types of data that exist today, there is an opportunity to produce new neural network models that can learn the attributes of physics from this numerical data and predict the weather with less compute required.

Traditionally numerical weather prediction offers the best accuracy in the short term, but greater understanding of atmospheric physics is required to accurately predict precipitation greater than a couple of days ahead.

The Digital Twin Earth project introduced RainBench - a combined dataset derived from three publicly-available sources:

• European Centre for Medium-Range Weather Forecasts (ECMWF) simulated satellite data (SimSat) - SimSat data are model-simulated satellite data generated from ECMWF’s high-resolution weather-forecasting model. • ECMWF Re-Analysis, 5th Edition (ERA5) - ERA5 reanalysis data provides hourly estimates of a variety of atmospheric, land and oceanic variables, such as specific humidity, temperature and geo-potential height at different pressure levels.. • Integrated Multi-Satellite Retrievals (IMERG) global precipitation estimates - this is a global half-hourly precipitation estimation product provided by NASA, primarily using satellite data from multiple polar-orbiting and geo-stationary satellites.

The RainBench dataset was then analysed using a three-step approach - State Estimation, State Forecasting and Precipitation Estimation to take the various datapoint and process them with a view to present a five-day global precipitation forecast.

As a result of the RainBench dataset study, the Digital Twin Earth project team was also able to introduce PyRain - a library to process the three datasets efficiently, reducing time and hardware costs and thus lowering the barrier to entry into this field. The progress of the project to date is summarised in the table below:

RainBench dataset study
Five-day forecast Before DTE Now with DTE
State Estimation (from SimSat data) No SimSat Estimation Estimate specific humidity from SimSat data.
State Forecasting Existing WeatherBench forecasts0 Improved 3-day temperature forecasts and 3-day wet variable forecasts
Precipitation Estimation ERA-model precipitation estimation Neural-network for precipitation estimation

You can learn more about this case study by reading the complete Technical Memorandum.

The Scan Partnership

NVIDIA is a key supporter of the Frontier Development Lab and the FDL Europe 2020 event , and Scan was asked to act as a technology partner of NVIDIA to provide access to multiple DGX-1 systems in order to facilitate much of the machine learning and deep learning development and training required. The DTE team used the NVIDIA DGX-1 machines to run six virtual machines connected to 40TB SSD centralised storage.

‘Given the amount of data we were dealing with, these resources were fundamental to speed-up our extremely time and memory consuming tasks.’ DTE Project Team.

Related content

Feature Page
Read the FDL Europe 2020 Handbook

Learn about the 2020 challenges and the teams that took part.

Read more
Feature Page
FDL Europe 2020 - Clouds & Aerosols

Aerosol effects on mesoscale cloud structures in marine boundary layer clouds

Read more