Coastal Carbon
logo
banner image
SeeFar Getting Started
Introduction
The Coastal Carbon Dataset, known as SeeFar version 0, is an evolving collection of multi-resolution satellite images from both public and commercial satellites. The dataset is specifically curated for training geospatial foundation models, and it is designed to be unconstrained by satellite type, making it highly versatile for various applications.
Dataset Overview
The SeeFar dataset includes satellite image rasters of 384x384 pixels, spanning four spectral bands: Blue, Green, Red, and Near-Infrared (NIR). It encompasses multiple spatial resolutions: 30 meters (Landsat 8/9), 10 meters (Sentinel 2), 1.5 meters (Spot 6/7), and 1 meter (Satellogic NewSat IV). These images are provided in a cloud-optimized GeoTIFF format.
Data Organization
  1. Data Structure
  2. The dataset is organized into directories, divided into three categories: “train”, “test”, “validate”, organized by satellite source. Each directory contains GeoTIFF files with integrated metadata. The directory structure is as follows:
    
      /seefar_v0
          /train
            /landsat
              - 0000001.tif
              - 0000002.tif
              ...
            /sentinel
              - 0000001.tif
              - 0000002.tif
              ...
            /newsat
              - 0000001.tif
              - 0000002.tif
              ...
            /spot
              - 0000001.tif
              - 0000002.tif
              ...
          /test
            /landsat
              - 0000001.tif
              - 0000002.tif
              ...
            /sentinel
              - 0000001.tif
              - 0000002.tif
              ...
            /newsat
              - 0000001.tif
              - 0000002.tif
              ...
            /spot
              - 0000001.tif
              - 0000002.tif
              ...
          /validate
            /landsat
              - 0000001.tif
              - 0000002.tif
              ...
            /sentinel
              - 0000001.tif
              - 0000002.tif
              ...
            /newsat
              - 0000001.tif
              - 0000002.tif
              ...
            /spot
              - 0000001.tif
              - 0000002.tif
              ...
        
  3. File Naming Convention
  4. Each GeoTIFF file is named using a consistent convention of a unique identifier for the satellite subset folder. For example:
    • ‘train/landsat/0000001.tif’
    • ‘test/sentinel/0000101.tif’
  5. Metadata
  6. Each raster file has integrated metadata including geospatial location, projection, and customized tags unique to SeeFar. The metadata includes information such as:
    • Ground Sample Distance (GSD)
    • Coordinate Reference System (CRS)
    • Georeferencing details
    • Example metadata:
    • JSON
      
          Metadata:
          {
            'driver': 'GTiff', 
            'dtype': 'uint16', 
            'nodata': 0.0, 
            'width': 384, 
            'height': 384, 
            'count': 4, 
            'crs': CRS.from_epsg(32640), 
            'transform': Affine(	1.0, 0.0, 158660.0,
                                  0.0, 1.0, 2533796.0)
          }
      
          Tags:
          {
            AREA_OR_POINT: Area
            BLUE_END: 510
            BLUE_START: 450
            GREEN_END: 580
            GREEN_START: 510
            GSD: 1.0
            NIR_END: 900
            NIR_START: 750
            RED_END: 690
            RED_START: 590
          }
          
Finding Data
  1. Search by Satellite and Resolution
  2. Users can navigate the directory structure to find images from a specific satellite which corresponds to the desired resolution. The naming convention and directory organization make it straightforward to locate the required files.
  3. Metadata-Based Search
  4. Metadata based search is limited as the geotiff will have to be loaded to view metadata. This can be accomplished by the following Python code with the RasterIO package:
    Python
    
      def investigate_geotiff(file_path):
        # Open the GeoTIFF file
        with rasterio.open(file_path) as dataset:
            # Print all metadata
            print("Metadata:", dataset.meta)
            print("\nTags:")
            for key, value in dataset.tags().items():
                print(f"{key}: {value}")
            print("\nBand Metadata:")
            for i in range(1, dataset.count + 1):
                print(f"Band {i}:", dataset.tags(i))
    
            # Other useful metadata
            print("\nCRS:", dataset.crs)
            print("Bounds:", dataset.bounds)
            print("Transform:", dataset.transform)
            print("Width:", dataset.width)
            print("Height:", dataset.height)
            print("Count:", dataset.count)
            print("Dtype:", dataset.dtypes)
            print("Driver:", dataset.driver)
    
      file_example = 'test/landsat/0000001.tif'
      investigate_geotiff(file_example)
        
Accessing the Data
The dataset is hosted on a publicly accessible platform AWS Open Data. Detailed instructions and scripts for accessing and downloading the data are provided on the platform.