SeeFar Getting Started
Introduction
The Coastal Carbon Dataset, known as SeeFar version 0, is an evolving collection of multi-resolution satellite images from both public and commercial satellites. The dataset is specifically curated for training geospatial foundation models, and it is designed to be unconstrained by satellite type, making it highly versatile for various applications.
Dataset Overview
The SeeFar dataset includes satellite image rasters of 384x384 pixels, spanning four spectral bands: Blue, Green, Red, and Near-Infrared (NIR). It encompasses multiple spatial resolutions: 30 meters (Landsat 8/9), 10 meters (Sentinel 2), 1.5 meters (Spot 6/7), and 1 meter (Satellogic NewSat IV). These images are provided in a cloud-optimized GeoTIFF format.
Data Organization
- Data Structure
- File Naming Convention
- ‘train/landsat/0000001.tif’
- ‘test/sentinel/0000101.tif’
- Metadata
- Ground Sample Distance (GSD)
- Coordinate Reference System (CRS)
- Georeferencing details
- Example metadata:
The dataset is organized into directories, divided into three categories: “train”, “test”, “validate”, organized by satellite source. Each directory contains GeoTIFF files with integrated metadata. The directory structure is as follows:
/seefar_v0
/train
/landsat
- 0000001.tif
- 0000002.tif
...
/sentinel
- 0000001.tif
- 0000002.tif
...
/newsat
- 0000001.tif
- 0000002.tif
...
/spot
- 0000001.tif
- 0000002.tif
...
/test
/landsat
- 0000001.tif
- 0000002.tif
...
/sentinel
- 0000001.tif
- 0000002.tif
...
/newsat
- 0000001.tif
- 0000002.tif
...
/spot
- 0000001.tif
- 0000002.tif
...
/validate
/landsat
- 0000001.tif
- 0000002.tif
...
/sentinel
- 0000001.tif
- 0000002.tif
...
/newsat
- 0000001.tif
- 0000002.tif
...
/spot
- 0000001.tif
- 0000002.tif
...
Each GeoTIFF file is named using a consistent convention of a unique identifier for the satellite subset folder. For example:
Each raster file has integrated metadata including geospatial location, projection, and customized tags unique to SeeFar. The metadata includes information such as:
JSON
Metadata:
{
'driver': 'GTiff',
'dtype': 'uint16',
'nodata': 0.0,
'width': 384,
'height': 384,
'count': 4,
'crs': CRS.from_epsg(32640),
'transform': Affine( 1.0, 0.0, 158660.0,
0.0, 1.0, 2533796.0)
}
Tags:
{
AREA_OR_POINT: Area
BLUE_END: 510
BLUE_START: 450
GREEN_END: 580
GREEN_START: 510
GSD: 1.0
NIR_END: 900
NIR_START: 750
RED_END: 690
RED_START: 590
}
Finding Data
- Search by Satellite and Resolution
- Metadata-Based Search
Users can navigate the directory structure to find images from a specific satellite which corresponds to the desired resolution. The naming convention and directory organization make it straightforward to locate the required files.
Metadata based search is limited as the geotiff will have to be loaded to view metadata. This can be accomplished by the following Python code with the RasterIO package:
Python
def investigate_geotiff(file_path):
# Open the GeoTIFF file
with rasterio.open(file_path) as dataset:
# Print all metadata
print("Metadata:", dataset.meta)
print("\nTags:")
for key, value in dataset.tags().items():
print(f"{key}: {value}")
print("\nBand Metadata:")
for i in range(1, dataset.count + 1):
print(f"Band {i}:", dataset.tags(i))
# Other useful metadata
print("\nCRS:", dataset.crs)
print("Bounds:", dataset.bounds)
print("Transform:", dataset.transform)
print("Width:", dataset.width)
print("Height:", dataset.height)
print("Count:", dataset.count)
print("Dtype:", dataset.dtypes)
print("Driver:", dataset.driver)
file_example = 'test/landsat/0000001.tif'
investigate_geotiff(file_example)
Accessing the Data
The dataset is hosted on a publicly accessible platform AWS Open Data. Detailed instructions and scripts for accessing and downloading the data are provided on the platform.