FDL Europe 2021 - World Food Embeddings using Earth Observation
Can AI and satellite images be utilised to track the world’s food supply?
Published 12 SEP 2022
Frontier Development Lab (FDL) is a public-private partnership with ESA in Europe and NASA in the USA. FDL works with commercial partners to apply AI technologies to space science, to push the frontiers of research and develop new tools to help solve some of the biggest challenges that humanity faces. These range from the affects of climate change to predicting space weather, from improving disaster response, to identifying meteorites that could hold the key to the history of our universe.
FDL Europe 2021 was a research sprint hosted by the University of Oxford, that takes place over a period of eight weeks in order to promote rapid learning and research outcomes in a collaborative atmosphere. This case study has been enabled by FDL alongside Phi-Lab (ESRIN) and ESA Mission Operations (ESOC), Trillium Technologies and the University of Oxford; the project has been also supported by Google Cloud, D-Orbit and Planet.
Self-supervised learning of Sentinel2 time-series for agricultural applications
Project Background
Feeding a growing global population in a changing climate presents a significant challenge to society. Projected crop yields in a range of agricultural and climatic scenarios depend on accurate crop maps to gauge food security prospects. Accurate information on the agricultural landscape obtained from remote sensing can help manage and improve agriculture and serve as the basis for yield forecasts. Large amounts of imagery and data are produced daily by an ever-growing multitude of earth observation satellites, however generating crop classification maps, especially in developing countries where information on the extent and types of crops is scarce, can be very challenging.
Developed regions such as Europe and North America have collected and labeled crop data for decades but these processes require costly surveying campaigns and such labeled archives are not generally produced in less developed regions. Since labelling in developing regions is often extremely sparse this makes it challenging to train supervised machine learning models to classify crops in satellite observations of these regions. Creating additional amounts of labeled data is also challenging as creating them is a highly time-consuming and error-prone task and data labelling can only be done by domain experts.
Project Data
Deep learning methods have greatly advanced very recently in unsupervised settings when there are no labels and the FDL project team aimed at using them to exploit specific properties of agriculture and food applications, such as invariances in crops, seasons and regions. The team developed scalable data pipelines to produce semantically rich embeddings, lower dimensional representations, representing crop satellite imagery. Thanks to these embeddings, models could potentially be trained faster models for crop detection, better exploiting the few available labels, revealing the underlying time evolution structure of crops and address many other downstream applications yet to be discovered. The imagery required for these models is derived from the Sentinel2 satellites. Sentinel2 consists of two satellites in sun synchronous orbits, to be able to operate in sunlight, where the data is delivered as tiles of 10,980 x 10,980 pixels and 12 optical bands, each pixel representing a 10m x 10m land square. Some tiles are overlapping and adjacent locations may be covered by tiles on the same orbit or on different orbits and thus times, but roughly each location is revisited every 5-10 days.
Project Approach
The diagram below shows the overall scheme for building predictive models in a setting where a self-supervised model is targeted at a certain pretext task, possibly trained with large amounts of unlabelled data. The pretext training precedes a supervised training for a downstream task. The pretext task is intended to contribute to the solution of the downstream task by providing the input data with a higher semantic information content compared to its original raw form. One must be able to define the pretext task by using the existing unlabelled data, trying to capture some invariance that can later be exploited in solving the downstream task.
The supervised setting for the downstream task may exploit the results of the self-supervised setting in two different ways:
- As pre-training: by sharing part of the model architecture and transferring weights into the supervised setting as doing fine tuning on the reduced labeled dataset
- As embeddings: by producing a more semantically concise representation of an input image, typically in the form of a vector of a certain length. This representation intends to capture whatever the pretext task was set forth - for instance, in a setting where we have satellite image data as a time series (revisits) of images for each location in several bands, we could have pretext tasks aimed at producing similar embeddings for spatially close by revisits (time invariance), or for specific months where crops are known to cycle (seasonal invariance).
In the case of the embeddings, these were addressed to capture certain invariances or representational semantics at different levels, with respect to space and time. From the spatial perspective they included:
- Pixel level embeddings: each pixel in the image gets a vector embedding.
- Patch / chip level embeddings: the full patch gets a single vector embedding, summarising somehow the content of all its pixels.
- Entity level embeddings: a patch might contain different entities (such as different crop fields scattered throughout the patch), this way each entity (field) would get a different vector embedding. Different patches might contain different numbers of entities and, this, a different quantities of vector embeddings.
From the time perspective they included:
- Embeddings per revisit: each revisit gets its own embedding, whether it is at the pixel or patch level. Therefore, a time-series of revisits becomes a time-series of embeddings.
- Time-series embeddings: the full time-series gets a vector embedding.
Depending on the choices above, different deep learning architectures would be more suitable or even feasible for models on the pretext and downstream tasks. For instance, segmentation architectures are more suitable to produce pixel level embeddings, which for instance might later be aggregated over entities (crop fields) or patches for a field classification downstream task, or may be used as such for a crop segmentation downstream task. On the other hand, convolutional architectures (reducing image dimensions) are more suitable to produce entity or patch level embeddings and classical machine learning methods (including multilayer perceptrons) can operate directly on entity or patch level embeddings to produce classifications or similarity measures.
Project Results
Using Momentum Contrast (MoCo) architectures, the FDL project team generated unsupervised embeddings to later use in a crop/no-crop classification downstream task. MoCo generated pixel level embeddings both per revisit (with 2d convolutions) and for a full time-series (with 3d convolutions), using different backbone architectures for the encoder.
For the crop/no-crop downstream segmentation task carried out in Southern England, the team showed that with embeddings the dataset trains faster as seen below with the blue line showing with embeddings and the orange line without. For the visualisations, the underlying structures emerge, even if they are not so obvious in the mosaic of contiguous chips. Additionally, due to the nature of the contrastive task, similar colours signal similar embeddings, capturing spatial locations with similar time evolutions.
Conclusion
The FDL team carried out a wealth of experiments with different datasets and architectures, with different degrees of success. In general, the results suggest that designing pretext and downstream tasks must be done jointly and carefully, possibly with domain expert advice. It leaves the door open for further experiments and studies to follow.
You can learn more about this case study by reading the FDL 2021 Research Booklet , where a full Technical Memorandum can also be downloaded.
The Scan Partnership
Scan and NVIDIA are keen supporters of the Frontier Development Lab and the FDL Europe 2021 event provided access to multiple NVIDIA DGX AI systemss in order to facilitate much of the machine learning and deep learning development and training required. The ML Onboard project team used Google Cloud Platform (GCP) instances to prototype their models, prior to trained models then being deployed on a NVIDIA DGX platform in order to accelerate time to results.
‘This is the second year running we’ve worked with NVIDIA to support the FDL event and we are already signed up to be part of the infrastructure for FDL 2022. It is a huge privilege to be associated with such ground-breaking research.’
Marnie Sutton, Director of Enterprise GPU - Scan
Related content
FDL Europe 2021 - ML Onboard
Using AI and satellite imagery to identify extreme weather ‘hot spots’.
Read moreFDL Europe 2020 - Digital Twin Earth
Machine learning to make rain forecasting more accurate.
Read moreFDL Europe 2020 - Clouds & Aerosols
Aerosol effects on mesoscale cloud structures in marine boundary layer clouds
Read more