Skip to content

EchoFire/FireFeatures

Repository files navigation

FireFeatures

This repository contains the feature engineering part of the project. It includes notebooks that generate a dataset with all the features for Portugal and datasets for a smaller case study focused on the Montesinho Natural Park. This part consists of three subparts.

The notebook Merging_data_sets.ipynb:
Merges the CSV datasets corresponding to fire, NDVI, soil moisture, and weather conditions (MODIS NDVI, SMAP soil moisture, FIRMS fire detections, and Meteostat weather data) for Portugal with the tessellation into one dataset organized by time and location.
Creates a large dataset allfeat_portugal.csv (portugal_all in the image below) for the time period from 01.01.2020 until 30.03.2025, and also creates separate datasets for the years 2020, 2021, 2022, 2023, and 2024.
The data is raw in the sense that it was not completed; there are missing values.

image

The notebook CompletingMissingValuesMontesinho.ipynb: Starts from the dataset allfeat_portugal.csv, restricts the data to the tiles that cover the area of Montesinho Natural park (32 tiles), and completes the missing values following a predefined order.

The rules to complete missing entries are:

  • Compute daily averages over the 32 tiles.
  • Compute an overall average to use as fallback.
  • Rule 1: If a daily average exists, fill with this value.
  • Rule 2: Use the average of the previous and next day.
  • Rule 3: Fallback to the overall average. At the end, creates the dataset Montesinho_OriginalFeatures_complete.csv Missing avg_NDVI were filled using forward filling limit=15.

The notebook creation_data_set_fires_Montesinho_park.ipynb:
Contains feature engineering and dataset creation for Montesinho Park at daily and weekly time frequency for the time window 01/01/2020 to 30/03/2025.
It starts from the dataset Montesinho_OriginalFeatures_complete.csv, which was previously cleaned and completed.
The data is aggregated by tiles on the same date and results in the dataset montesinho_processed.csv, consisting of 1916 rows and 61 columns.
The dataset is complete, has no missing values, and contains engineered features. The temporal resolution is daily.

Feature Engineering

  • Created 7-, 14-, and 30-day lag features
  • Added temporal and spatial context features (month, week)
  • Created rolling features with windows of 7 and 14 days

Lag and rolling features were computed per tile before aggregating the tiles by date. Missing values introduced by lag and rolling operations were filled using backward fill.

The dataset for the park was then aggregated by date using the following aggregation rules:

  • Maximum values for max_T21, avg_NDVI, tmax
  • Minimum values for soil_moisture (am and pm) and tmin
  • Averages for the remaining variables

The notebook List_features_montesinho_processed.ipynb contains the complete list of features of the dataset montesinho_processed.csv.

Data Processing Pipeline

The feature engineering pipeline follows a sequential workflow:

  1. Data integration (Portugal-wide)

    • Raw data from MODIS NDVI, SMAP soil moisture, FIRMS fire detections, and Meteostat weather are merged with the spatial tessellation.
    • This step is performed in Merging_data_sets.ipynb.
    • Output: allfeat_portugal.csv (raw, incomplete dataset).
  2. Spatial restriction and missing value completion

    • The Portugal-wide dataset is restricted to the tiles covering Montesinho Natural Park.
    • Missing values are completed following a rule-based temporal aggregation strategy.
    • This step is performed in CompletingMissingValuesMontesinho.ipynb.
    • Output: Montesinho_OriginalFeatures_complete.csv (complete dataset at tile level).
  3. Feature engineering and temporal aggregation

    • Lagged and rolling features are computed at tile level.
    • The data is aggregated spatially by date to produce daily and weekly datasets for Montesinho Park.
    • This step is performed in creation_data_set_fires_Montesinho_park.ipynb.
    • Output: montesinho_processed.csv (final feature set, no missing values).

About

Wildfire features for Portugal, plus a case study in Montesinho Natural Park.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •