(Geo)spatial data sets

In which I complain about paying a nominal fee for giant rocket robots that scan the earth from space

March 1, 2021 — December 29, 2024

computers are awful
data sets
spatial
statistics
Figure 1

Satellite images, geological tomography, climate and data records, miscellaneous useful data points about our globe

1 Maps and satellite photos

Here is a review of satellite image sources. I have only checked out a handful of these. If you just want eye candy, NASA Visible Earth is a good one. I’m fond of LANDSAT maps. Various can be found through Earth Explorer. All these resources blur into one after a while, with similarly confusing interfaces, unexpected UI glitches, and apparently random surprise pricing structures revealed belatedly.

openEO

openEO develops an open API to connect R, Python, JavaScript and other clients to big Earth observation cloud back-ends in a simple and unified way.

Earth Observation data are becoming too large to be downloaded locally for analysis. Also, the way they are organised (as tiles, or granules: files containing the imagery for a small part of the Earth and a single observation date) makes it unnecessarily complicated to analyse them. The solution to this is to store these data in the cloud, on compute back-ends, process them there, and browse the results or download resulting figures or numbers. But how do we do that?

openEO develops an open application programming interface (API) that connects clients like R, Python and JavaScript to big Earth observation cloud back-ends in a simple and unified way.

earthengine.google.com/ provides lots of imagery with an eye to discoverability and UX.

The public data archive includes more than thirty years of historical imagery and scientific datasets, updated and expanded daily. It contains over twenty petabytes of geospatial data instantly available for analysis.

See also Australia-specific stuff.

1.1 Eye candy

The special subcategory of geospatial data that looks pretty.

Masterclasses in turning geographic datasets into eye candy:

2 Weather/climate

CHIRPS: Rainfall Estimates from Rain Gauge and Satellite Observations

pangeo is an umbrella organisation providing many geospatial data tools including a catalogue of hydrological, oceanographic and suchlike.

from intake import open_catalog

cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml")
list(cat)

Open Data Cube is a whole python library for working with satellite images and other large-scale raster data.

Extreme Weather Dataset Racah et al. (2017) includes for each year a (1460,16,768,1152) array, containing

  • 1460 example images (4 per day, 365 days in the year)
  • 16 channels in each image corresponding to various weather-related quantities
  • each channel is 768 x 1152 corresponding to one measurement per 25 square km on earth

3 Biota

Esp remote sensing biodiversity. (Guo et al. 2023; Harwood et al. 2021; Mokany et al. 2022, 2022; Williams et al. 2021):

4 Data assimilation

5 Incoming

EOD

“This webpage provides an interactive and searchable catalogue of public benchmark datasets for earth observation with the aim to support researchers in the fields of geoscience, remote sensing, and ML.“

Foursquare Open Source Places: A new foundational dataset for the geospatial community | Foursquare

Unfortunately, in geospatial, location, and mapping software the data layer remains largely the provenance of large scale proprietary systems. The walled garden nature of the data layer greatly hampers the industry’s ability to go from strict specialization to generalized adoption, and it is in the general adoption layer that the real value to customers exists.

In an effort to change that dynamic, we are announcing today the general availability of a foundational open data set, Foursquare Open Source Places (“FSQ OS Places”). This base layer of 100mm+ global places of interest (“POI”) includes 22 core attributes (see schema here) that will be updated monthly and available for commercial use under the Apache 2.0 license framework.

6 References

Camps-Valls, Campos-Taberner, Moreno-Martínez, et al. 2021. A Unified Vegetation Index for Quantifying the Terrestrial Biosphere.” Science Advances.
Guo, Mokany, Ong, et al. 2023. Plant Species Richness Prediction from DESIS Hyperspectral Data: A Comparison Study on Feature Extraction Procedures and Regression Models.” ISPRS Journal of Photogrammetry and Remote Sensing.
Harwood, Williams, Lehmann, et al. 2021. 9 Arcsecond Gridded HCAS 2.1 (2001-2018) Base Model Estimation of Habitat Condition for Terrestrial Biodiversity, 18-Year Trend and 2010-2015 Epoch Change for Continental Australia.”
Mokany, McCarthy, Falster, et al. 2022. Plant Diversity Spatial Layers for Australia.”
Pomerleau. 1989. ALVINN: An Autonomous Land Vehicle in a Neural Network.” In Advances in Neural Information Processing Systems.
Racah, Beckham, Maharaj, et al. 2017. ExtremeWeather: A Large-Scale Climate Dataset for Semi-Supervised Detection, Localization, and Understanding of Extreme Weather Events.” In Advances in Neural Information Processing Systems.
Roberts, Wilford, and Ghattas. 2019. Exposed Soil and Mineral Map of the Australian Continent Revealing the Land at Its Barest.” Nature Communications.
Williams, Harwood, Eric A., et al. 2021. Habitat Condition Assessment System (HCAS Version 2.1) Enhanced Method for Mapping Habitat Condition and Change Across Australia.”